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Summary 



' The finite element metliod can be viewed as a machine that automates the discretization of differential 

' equations, taking as input a variational problem, a finite element and a mesh, and producing as output 

' a system of discrete equations. However, the generality of the framework provided by the finite element 

(-H I method is seldom reflected in implementations (realizations), which are often specialized and can handle 

, only a small set of variational problems and finite elements (but are typically parametrized over the choice 

■ of mesh). 

This paper reviews ongoing research in the direction of a complete automation of the finite element method. 
In particular, this work discusses algorithms for the efficient and automatic computation of a system of 
discrete equations from a given variational problem, finite element and mesh. It is demonstrated that by 
automatically generating and compiling efficient low-level code, it is possible to parametrize a finite element 
^ ' code over variational problem and finite element in addition to the mesh. 

m ■ 
cn 

p 

(N ■ 1 INTRODUCTION 

The finite element method (Galerkin's method) has emerged as a universal method for 
the solution of differential equations. Much of the success of the finite element method 
can be contributed to its generality and simplicity, allowing a wide range of differential 
^ ' equations from all areas of science to be analyzed and solved within a common framework. 

Another contributing factor to the success of the finite element method is the flexibility of 
formulation, allowing the properties of the discretization to be controlled by the choice of 
finite element (approximating spaces). 

At the same time, the generality and flexibility of the finite element method has for a 
long time prevented its automation, since any computer code attempting to automate it 
must necessarily be parametrized over the choice of variational problem and finite element, 
which is difficult. Consequently, much of the work must still be done by hand, which is 
both tedious and error-prone, and results in long development times for simulation codes. 

Automating systems for the solution of differential equations are often met with skepti- 
cism, since it is believed that the generality and flexibility of such tools cannot be combined 
with the efficiency of competing specialized codes that only need to handle one equation 
for a single choice of finite element. However, as will be demonstrated in this paper, by 
automatically generating and compiling low-level code for any given equation and finite ele- 
ment, it is possible to develop systems that realize the generality and flexibility of the finite 
element method, while competing with or outperforming specialized and hand-optimized 
codes. 
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1.1 Automating the Finite Element Method 

To automate the finite element method, we need to build a machine that takes as input a 
discrete variational problem posed on a pair of discrete function spaces defined by a set of 
finite elements on a mesh, and generates as output a system of discrete equations for the 
degrees of freedom of the solution of the variational problem. Li particular, given a discrete 
variational problem of the form: Find U GVh such that 

aiU;v)=L{v) VveVh, (1) 

where a:V/iXV/i— )-Misa semilinear form which is linear in its second argument, L : V/i ^ M 
a linear form and (V/ii Vh) a given pair of discrete function spaces (the test and trial spaces), 
the machine should automatically generate the discrete system 

F{U) = 0, (2) 

where F :Vh^R^, N = \Vh\ = \Vh\ and 

Fi{U) = aiU-J^) - L{^i), i = l,2,...,iV, (3) 

for a given basis for Vh- 

Typically, the discrete variational problem ([1]) is obtained as the discrete version of a 
corresponding continuous variational problem: Find u (zV such that 

a{u]v)=L{v) yveV, (4) 

where Vh C V and Vh C V. 

The machine should also automatically generate the discrete representation of the lin- 
earization of the given semilinear form a, that is the matrix A € M^^^ defined by 

A,{U) = a'{U-Ji,(l),), i,j = l,2,...,N, (5) 

where a' : V/j x V/^ x V/j — ?> M is the Frechet derivative of o with respect to its first argument 
and and are bases for Vh and Vh respectively. 

In the simplest case of a linear variational problem, 

aiv,U)=L{v) yveVh, (6) 
the machine should automatically generate the linear system 

AU = b, (7) 

where Aij = a{4>i,(j)j) and bi = L{(l)i), and where (Ui) € M-^ is the vector of degrees of 
freedom for the discrete solution U, that is, the expansion coefficients in the given basis for 
Vh, 

N 

U = Y,Ui(t>i. (8) 

i=l 

We return to this in detail below and identify the key steps towards a complete automa- 
tion of the finite element method, including algorithms and prototype implementations for 
each of the key steps. 
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1.2 The FEniCS Project and the Automation of CMM 

The FEniCS project [601 ES] was initiated in 2003 with the explicit goal of developing free 
software for the Automation of Computational Mathematical Modeling (CMM), including 
a complete automation of the finite element method. As such, FEniCS serves as a prototype 
implementation of the methods and principles put forward in this paper. 

In [SB], an agenda for the automation of CMM is outlined, including the automation 
of (i) discretization, (ii) discrete solution, (iii) error control, (iv) modeling and (v) opti- 
mization. The automation of discretization amounts to the automatic generation of the 
system of discrete equations ([2]) or d?]) from a given given differential equation or varia- 
tional problem. Choosing as the foundation for the automation of discretization the finite 
element method, the first step towards the Automation of CMM is thus the automation of 
the finite element method. We continue the discussion on the automation of CMM below 
in Section [TTl 

Since the initiation of the FEniCS project in 2003, much progress has been made, espe- 
cially concerning the automation of discretization. In particular, two central components 
that automate central aspects of the finite element method have been developed. The first 
of these components is FIAT, the Finite element Automatic Tabulator [521 [521 El] ? which 
automates the generation of finite element basis functions for a large class of finite elements. 
The second component is FFC, the FEniCS Form Compiler [981 [571 [MllHS] , which automates 
the evaluation of variational problems by automatically generating low-level code for the 
assembly of the system of discrete equations from given input in the form of a variational 
problem and a (set of) finite element(s). 

In addition to FIAT and FFC, the FEniCS project develops components that wrap the 
functionality of collections of other FEniCS components (middleware) to provide simple, 
consistent and intuitive user interfaces for application programmers. One such example 
is DOLFIN \62\ [651 I63| . which provides both a C-I-+ and a Python interface (through 
SWIG to the basic functionality of FEniCS. 

We give more details below in Section [9] on FIAT, FFC, DOLFIN and other FEniCS 
components, but list here some of the key properties of the software components developed 
as part of the FEniCS project, as well as the FEniCS system as a whole: 

• automatic and efficient evaluation of variational problems through FFC [981 1571 [551 
I99j . including support for arbitrary mixed formulations; 

• automatic and efficient assembly of systems of discrete equations through DOLFIN [62l 

[M11H3]; 

• support for general families of finite elements, including continuous and discontinuous 
Lagrange finite elements of arbitrary degree on simplices through FIAT [831 [821 ; 

• high-performance parallel linear algebra through PETSc [9l [51 [10]: 

• arbitrary order multi-adaptive mcG(g')/mdG(g') and mono-adaptive cG((/)/dG(g) ODE 
solvers [Ml ESI [23 [SZl [M] • 

1.3 Automation and Mathematics Education 

By automating mathematical concepts, that is, implementing corresponding concepts in 
software, it is possible to raise the level of the mathematics education. An aspect of this is 
the possibility of allowing students to experiment with mathematical concepts and thereby 
obtaining an increased understanding (or familiarity) for the concepts. An illustrative 
example is the concept of a vector in M", which many students get to know very well 
through experimentation and exercises in Octave [37] or MATLAB jll8j . If asked which is 
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the true vector, the x on the blackboard or the x on the computer screen, many students 
(and the author) would point towards the computer. 

By automating the finite element method, much like linear algebra has been automated 
before, new advances can be brought to the mathematics education. One example of this 
is Puffin \7U\ I69j . which is a minimal and educational implementation of the basic func- 
tionality of FEniCS for Octave/MATLAB. Puffin has successfully been used in a number 
of courses at Chalmers in Goteborg and the Royal Institute of Technology in Stockholm, 
ranging from introductory undergraduate courses to advanced undergraduate/beginning 
graduate courses. Using Puffin, first-year undergraduate students are able to design and 
implement solvers for coupled systems of convection-diffusion-reaction equations, and thus 
obtaining important understanding of mathematical modeling, differential equations, the 
finite element method and programming, without reducing the students to button-pushers. 

Using the computer as an integrated part of the mathematics education constitutes a 
change of paradigm |67j . which will have profound influence on future mathematics educa- 
tion. 

1.4 Outline 

This paper is organized as follows. In the next section, we first present a survey of existing 
finite element software that automate particular aspects of the finite element method. In 
Section [3l we then give an introduction to the finite element method with special emphasis 
on the process of generating the system of discrete equations from a given variational 
problem, finite element (s) and mesh. A summary of the notation can be found at the end 
of this paper. 

Having thus set the stage for our main topic, we next identify in Sections HHHl the 
key steps towards an automation of the finite element method and present algorithms and 
systems that accomplish (in part) the automation of each of these key steps. We also discuss 
a framework for generating an optimized computation from these algorithms in Section [71 In 
Section [HI we then highlight a number of important concepts and techniques from software 
engineering that play an important role for the automation of the finite element method. 

Prototype implementations of the algorithms are then discussed in Section [9l including 
benchmark results that demonstrate the efficiency of the algorithms and their implementa- 
tions. We then, in Section [TU[ present a number of examples to illustrate the benefits of a 
system automating the finite element method. As an outlook towards further research, we 
present in Section [11] an agenda for the development of an extended automating system for 
the Automation of CMM, for which the automation of the finite element method plays a 
central role. Finally, we summarize our findings in Section [H[ 

2 SURVEY OF CURRENT FINITE ELEMENT SOFTWARE 

There exist today a number of projects that seek to create systems that (in part) automate 
the finite element method. In this section, we survey some of these projects. A complete 
survey is difficult to make because of the large number of such projects. The survey is 
instead limited to a small set of projects that have attracted the attention of the author. 
In effect, this means that most proprietary systems have been excluded from this survey. 

It is instructional to group the systems both by their level of automation and their 
design. In particular, a number of systems provide automated generation of the system 
of discrete equations from a given variational problem, which we in this paper refer to as 
the automation of the finite element method or automatic assembly, while other systems 
only provide the user with a basic toolkit for this purpose. Grouping the systems by their 
design, we shall differentiate between systems that provide their functionality in the form of 
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a library in an existing language and systems that implement new domain-specific languages 
for finite element computation. A summary for the surveyed systems is given in Table [TJ 

It is also instructional to compare the basic specification of a simple test problem such 
as Poisson's equation, —Au = / in some domain Q C M'^, for the surveyed systems, or more 
precisely, the specification of the corresponding discrete variational problem a{v, U) = L{v) 
for all V in some suitable test space, with the bilinear form a given by 

a{v,U) = f Vv Vf/dx, (9) 
Jo. 

and the linear form L given by 

L{v) = [ vfdx. (10) 
Jn 

Each of the surveyed systems allow the specification of the variational problem for 
Poisson's equation with varying degree of automation. Some of the systems provide a 
high level of automation and allow the variational problem to be specified in a notation 
that is very close to the mathematical notation used in ([9]) and (|10p . while others require 
more user-intervention. In connection to the presentation of each of the surveyed systems 
below, we include as an illustration the specification of the variational problem for Poisson's 
equation in the notation employed by the system in question. In all cases, we include only 
the part of the code essential to the specification of the variational problem. Since the 
different systems are implemented in different languages, sometimes even providing new 
domain-specific languages, and since there are differences in philosophies, basic concepts 
and capabilities, it is difficult to make a uniform presentation. As a result, not all the 
examples specify exactly the same problem. 



Project 


automatic assembly 


library / language 


license 


Analysa 


yes 


language 


proprietary 


deal.II 


no 


library 


QPLi 


Diffpack 


no 


library 


proprietary 


FEniCS 


yes 


both 


GPL, LGPL 


PreeFEM 


yes 


language 


LGPL 


GetDP 


yes 


language 


GPL 


Sundance 


yes 


library 


LGPL 



Table 1. Summary of projects seeking to automate the finite element method. 



2.1 Analysa 

Analysa [6l [7] is a domain-specific language and problem-solving environment (PSE) for 
partial differential equations. Analysa is based on the functional language Scheme and 
provides a language for the definition of variational problems. Analysa thus falls into the 
category of domain-specific languages. 

Analysa puts forward the idea that it is sometimes desirable to compute the action of 
a bilinear form, rather than assembling the matrix representing the bilinear form in the 
current basis. In the notation of [7], the action of a bilinear form a:V/jXV/j— T-Mona 
given discrete function U G Vh is 

w = a{Vh,U) £R^, (11) 



^In addition to the terms imposed by the QPL, the deal.II license imposes a form of advertising clause, 
requiring the citation of certain publications. See [12] for details. 
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where 

Wi = a{4>i,U), i = l,2,...,N. (12) 

Of course, we have w = AU, where A is the matrix representing the bihnear form, with 
Aij = a{4>i,(l)j), and (C/j) G is the vector of expansion coefficients for U in the basis of 
Vh- It follows that 

w = a{Vh,U) = a{Vh,Vh)U. (13) 

If the action only needs to be evaluated a few times for different discrete functions U 
before updating a linearization (reassembling the matrix A), it might be more efficient to 
compute each action directly than first assembling the matrix A and applying it to each U. 

To specify the variational problem for Poisson's equation with Analysa, one specifies a 
pair of bilinear forms a and m, where a represents the bilinear form a in ([9|) and m represents 
the bilinear form 

m{v, U) = I vUdx, (14) 
Jn 

corresponding to a mass matrix. In the language of Analysa, the linear form L in (|10p is 
represented as the application of the bilinear form m on the test space Vh and the right-hand 
side /, 

L{^i) = m{VhJh i = l,2,...,iV, (15) 

as shown in Table [2j Note that Analysa thus defers the coupling of the forms and the test 
and trial spaces until the computation of the system of discrete equations. 

(integral-forms 

( (a V U) (dot (gradient v) (gradient U) ) ) 
((m V U) (* V U)) 

) 

(elements 

(element (lagrange-simplex 1)) 

) 

(spaces 

(test-space (fe element (all mesh) r:)) 
(trial-space (fe element (all mesh) r:)) 

) 

(functions 

(f (interpolant test-space (...))) 

) 

(define A-matrix (a testspace trial-space)) 
(define b-vector (m testspace f)) 

Table 2. Specifying the variational problem for Poisson's equation with Analysa 
using piecewise linear elements on simplices (triangles or tetrahedra) . 

2.2 deal.II 

deal. II \12\ [T3l [TT] is a C-|— I- library for finite element computation. While providing 
tools for finite elements, meshes and linear algebra, deal.II does not provide support for 
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automatic assembly. Instead, a user needs to supply the complete code for the assembly 
of the system ([7]), including the explicit computation of the element stiffness matrix (see 
Section [3] below) by quadrature, and the insertion of each element stiffness matrix into the 
global matrix, as illustrated in Table [3j This is a common design for many finite element 
libraries, where the ambition is not to automate the finite element method, but only to 
provide a set of basic tools. 



for (dof_handler .begin_active() ; cell! = dof _handler . end() ; ++cell) 
{ 

for (unsigned int i = 0; i < dof s_per_cell ; ++i) 
for (unsigned int j = 0; j < dof s_per_cell ; ++j) 

for (unsigned int q_point = 0; q_point < n_q_points; ++q_point) 
cell_matrix(i , j) += (f e_values . shape_grad (i, q_point) * 

f e_values . shape_grad (j , q_point) * 
f e_values . JxW(q_point) ) ; 

for (unsigned int i = 0; i < dof s_per_cell ; ++i) 

for (unsigned int q_point = 0; q_point < n_q_points; ++q_point) 
cell_rhs(i) += (f e_values . shape_value (i, q_point) * 
<value of right-hand side f> * 
f e_values . JxW(q_point) ) ; 

cell->get_dof _indices (local_dof _indices) ; 

for (unsigned int i = 0; i < dof s_per_cell ; ++i) 
for (unsigned int j = 0; j < dof s_per_cell ; ++j) 
system_matrix . add(local_dof _indices [i] , 
local_dof _indices [j] , 
cell_matrix(i, j)); 

for (unsigned int i = 0; i < dof s_per_cell ; ++i) 
system_rhs (local_dof _indices [i] ) += cell_rhs(i); 

> 



Table 3. Assembling the linear system ^ for Poisson's equation with deal. II. 



2.3 DifTpack 

Diffpack \24\ [92] is a C++ library for finite element and finite difference solution of partial 
differential equations. Initiated in 1991, in a time when most finite element codes were 
written in FORTRAN, Diffpack was one of the pioneering libraries for scientific computing 
with C++. Although originally released as free software, Diffpack is now a proprietary 
product. 

Much like deal.II, Diffpack requires the user to supply the code for the computation 
of the element stiffness matrix, but automatically handles the loop over quadrature points 
and the insertion of the element stiffness matrix into the global matrix, as illustrated in 
Table H 
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for (int i = 


1 ; i <= nbf ; i++) 






for (int j 


= 1; j <= nbf; 






elmat . A( 


i, j) += (fe.dN(i, 1) 


* fe.dN(j , 


1) + 




fe.dN(i, 2) 


* fe.dN(j , 


2) + 




fe.dN(i, 3) 


* fe.dN(j , 


3)) * detJxW; 


for (int i = 


1; i <= nbf; i++) 






elmat .b(i) 


+= fe.N(i)*<value of 


right-hand 


side f>*detJxW; 



Table 4. Computing the element stiffness matrix and element load vector for 
Poisson's equation with Diffpack. 



2.4 FEniCS 

The FEniCS project |60( [36] is structured as a system of interoperable components that 
automate central aspects of the finite element method. One of these components is the form 
compiler FFC [Ml [871 IMl [99] , which takes as input a variational problem together with a set 
of finite elements and generates low-level code for the automatic computation of the system 
of discrete equations. In this regard, the FEniCS system implements a domain-specific 
language for finite element computation, since the form is entered in a special language 
interpreted by the compiler. On the other hand, the form compiler FFC is also available 
as a Python module and can be used as a just-in-time (JIT) compiler, allowing variational 
problems to be specified and computed with from within the Python scripting environment. 
The FEniCS system thus falls into both categories of being a library and a domain-specific 
language, depending on which interface is used. 

To specify the variational problem for Poisson's equation with FEniCS, one must specify 
a pair of basis functions v and U, the right-hand side function f , and of course the bilinear 
form a and the linear form L, as shown in Table [5j 



element = FiniteElementC 'Lagrange' ' , 


' 'tetrahedron' ' 


, 1) 


V = BasisFunction(element) 
U = BasisFunction(element) 
f = Function(element) 






a = dot(grad(v), grad(U))*dx 
L = v*f*dx 







Table 5. Specifying the variational problem for Poisson's equation with FEniCS 
using piecewise linear elements on tetrahedra. 



Note in Table[5]that the function spaces (finite elements) for the test and trial functions v 
and U together with all additional functions/coefficients (in this case the right-hand side f ) 
are fixed at compile-time, which allows the generation of very efficient low-level code since 
the code can be generated for the specific given variational problem and the specific given 
finite element (s). 

Just like Analysa, FEniCS (or FFC) supports the specification of actions, but while 
Analysa allows the specification of a general expression that can later be treated as a 
bilinear form, by applying it to a pair of function spaces, or as a linear form, by applying 
it to a function space and a given fixed function, the arity of the form must be known at 
the time of specification in the form language of FFC. As an example, the specification of 
a linear form a representing the action of the bilinear form Q on a function U is given in 
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Table El 



element = FiniteElementC 'Lagrange' ' , 


' 'tetrahedron' ' 


, 1) 


V = BasisFunction(element) 
U = Function(element) 






a = dot(grad(v), grad(U))*dx 







Table 6. Specifying the linear form for the action of the bihnear form ® with 
FEniCS using piecewise hnear elements on tetrahedra. 



A more detailed account of the various components of the FEniCS project is given below 
in Section [9l 

2.5 FreeFEM 

PreeFEM |1U8[ [55] implements a domain-specific language for finite element solution of 
partial differential equations. The language is based on C++, extended with a special 
language that allows the specification of variational problems. In this respect, FreeFEM 
is a compiler, but it also provides an integrated development environment (IDE) in which 
programs can be entered, compiled (with a special compiler) and executed. Visualization 
of solutions is also provided. 

FreeFEM comes in two flavors, the current version FreeFEM++ which only supports 
2D problems and the 3D version FreeFEMSD. Support for 3D problems will be added to 
FreeFEM++ in the future. ^TOE\ . 

To specify the variational problem for Poisson's equation with FreeFEM++, one must 
first define the test and trial spaces (which we here take to be the same space V), and then 
the test and trial functions v and U, as well as the function f for the right-hand side. One 
may then define the bilinear form a and linear form L as illustrated in Table [71 



fespace V(mesh, PI) ; 

V V, U; 

func f = . . . ; 

varform a(v, U) = int2d(mesh) (dx(v)*dx(U) + dy (v) *dy (U) ) ; 
varform L(v) = int2d(mesh) (v*f ) ; 

Table 7. Specifying the variational problem for Poisson's equation with 
FreeFEMH — h using piecewise linear elements on triangles (as deter- 
mined by the mesh). 

2.6 GetDP 

GetDP |35[ is a finite element solver which provides a special declarative language for 
the specification of variational problems. Unlike FreeFEM, GetDP is not a compiler, nor is 
it a library, but it will be classified here under the category of domain-specific languages. 
At start-up, GetDP parses a problem specification from a given input file and then proceeds 
according to the specification. 

To specify the variational problem for Poisson's equation with GetDP, one must first give 
a definition of a function space, which may include constraints and definition of sub spaces. 
A variational problem may then be specified in terms of functions from the previously 
defined function spaces, as illustrated in Table [HI 
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FunctionSpace { 

{ Name V; Type FormO; 
BasisFunction { 
{ ... } 

} 

} 

> 

Formulation { 

{ Name Poisson; Type FemEquation; 
Quantity { 

{ Name v; Type Local; NameOf Space V; } 

} 

Equation { 

Galerkin { [Dof{Grad v}, {Grad v>] ; 

> 

} 

} 

> 

Table 8. Specifying the bilinear form for Poisson's equation with GetDP. 



2.7 Sundance 

Sundance [1031 1101^ I102j is a C++ library for finite element solution of partial differential 
equations (PDEs), with special emphasis on large-scale PDE-constrained optimization. 

Sundance supports automatic generation of the system of discrete equations from a 
given variational problem and has a powerful symbolic engine, which allows variational 
problems to be specified and differentiated symbolically natively in C++. Sundance thus 
falls into the category of systems providing their functionality in the form of library. 

To specify the variational problem for Poisson's equation with Sundance, one must 
specify a test function v, an unknown function U, the right-hand side f , the differential 
operator grad and the variational problem written in the form a{v, U) — L{v) = 0, as 
shown in Tabled 



Expr V = new TestFunction(new Lagrange ( 1 )) ; 
Expr U = new UnknownFunction(new Lagrange ( 1 )) ; 
Expr f = new DiscreteFunction( . . .) ; 

Expr dx = new Derivative (0) ; 
Expr dy = new Derivative(l) ; 
Expr dz = new Derivative (2) ; 
Expr grad = List(dx, dy, dz) ; 

Expr poisson = Integral ((grad*v)*(grad*U) - v*f); 

Table 9. Specifying the variational problem for Poisson's equation with Sun- 
dance using piecewise linear elements on tetrahedra (as determined by 
the mesh). 
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3 THE FINITE ELEMENT METHOD 

It once happened that a man thought he had written original 
verses, and was then found to have read them word for word, 
long before, in some ancient poet. 

Gottfried Wilhelm Leibniz 
Nouveaux essais sur Ventendement humain (1704/1764) 

In this section, we give an overview of the finite element method, with special focus on 
the general algorithmic aspects that form the basis for its automation. In many ways, the 
material is standard [Ml [UTl [261 [271 [HI El [201 [391 [IE], but it is presented here to give 
a background for the continued discussion on the automation of the finite element method 
and to summarize the notation used throughout the remainder of this paper. The purpose 
is also to make precise what we set out to automate, including assumptions and limitations. 

3.1 Galerkin's Method 

Galerkin's method (the weighted residual method) was originally formulated with global 
polynomial spaces |S7] and goes back to the variational principles of Leibniz, Euler, La- 
grange, Dirichlet, Hamilton, Castigliano [22], Rayleigh |112| and Ritz |113j . Galerkin's 
method with piecewise polynomial spaces (V/i, V/i) is known as the finite element method. 
The finite element method was introduced by engineers for structural analysis in the 1950s 
and was independently proposed by Courant in 1943 |3Uj . The exploitation of the finite 
element method among engineers and mathematicians exploded in the 1960s. In addition 
to the references listed above, we point to the following general references: |381 H5l [l3l 

[Ml [m [mill]. 

We shall refer to the family of Galerkin methods (weighted residual methods) with 
piecewise (polynomial) function spaces as the finite element method, including Petrov- 
Galerkin methods (with different test and trial spaces) and Galer kin/least-squares methods. 

3.2 Finite Element Function Spaces 

A central aspect of the finite element method is the construction of discrete function spaces 
by piecing together local function spaces on the cells {K}k£^T of & mesh T of a domain 
Q. = UxeT C M'^, with each local function space defined by a finite element. 

3.2.1 The finite element 

We shall use the standard Ciarlet |271 [20] definition of a finite element, which reads as 
follows. A finite element is a triple {K,Vk ^J^k), where 

• K C R'^ is a bounded closed subset of M'^ with nonempty interior and piecewise 
smooth boundary; 

• Vk is a function space on K of dimension < oo; 

• J^K = {'^f" > ^ • • • 1 ^n^} ^ basis for "P^ (the bounded linear functionals on Vk). 

We shall further assume that we are given a nodal basis {(j)f}^J^i for Vk that for each node 
€ Mk satisfies {(p^) = 5ij for j = 1, 2, . . . , uk- Note that this implies that for any 
V G Vk, we have 

UK 

v = ^i^f{v)cPf. (16) 
i=l 
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In the simplest case, the nodes are given by evaluation of function values or directional 
derivatives at a set of points {a;f^}"i\, that is, 

i^f{v) = vixf), i = 1,2,..., riK- (17) 

3.2.2 The local-to-global mapping 

Now, to define a global function space = span{(/)j}^^ on Q, and a set of global nodes 
J\f = {iyi}fLi from a given set {{K, Vk^-N'k)}k&t of finite elements, we also need to specify 
how the local function spaces are pieced together. We do this by specifying for each cell 
i^T E T a local-to- global mapping, 

iK .[^^nK]^ N, (18) 

that specifies how the local nodes Mk are mapped to global nodes J\f, or more precisely, 

'^i.K{i)i^) = ^fi'^W), i = 1,2,. .. ,nK, (19) 

for any v G Vh, that is, each local node f/^ € Mk corresponds to a global node i^tj^(i) G A/" 
determined by the local-to- global mapping lk- 

3. 2. 3 The global function space 

We now define the global function space Vh as the set of functions on satisfying 

v\k e Vk yK e r, (20) 

and furthermore satisfying the constraint that if for any pair of cells {K, K') & T x T and 
local node numbers £ [l,nK] x [l,ni<-'], we have 

tKii) = iK'{i'), (21) 

then 

i^Hv\K) = i^!f'iv\K'), (22) 

where v\k denotes the continuous extension to K of the restriction of v to the interior of 
K, that is, if two local nodes f/^ and are mapped to the same global node, then they 
must agree for each function v £ Vh. 

Note that by this construction, the functions of Vh are undefined on cell boundaries, 
unless the constraints (|22p force the (restrictions of) functions of Vh to be continuous on 
cell boundaries, in which case we may uniquely define the functions of Vh on the entire 
domain i}. However, this is usually not a problem, since we can perform all operations on 
the restrictions of functions to the local cells. 

3.2.4 Lagrange finite elements 

The basic example of finite element function spaces is given by the family of Lagrange finite 
elements on simplices in W^. A Lagrange finite element is given by a triple {K,Vk ,J^k), 
where the K is a simplex in (a line in R, a triangle in M2, a tetrahedron in M^), Vk is 
the space Pq{K) of scalar polynomials of degree < q on K and each vf^ E Mk is given by 
point evaluation at some point xf E K, as illustrated in Figure [1] for q = 1 and g = 2 on a 
triangulation of some domain C M^. Note that by the placement of the points {xf}^^^ 
at the vertices and edge midpoints of each cell K, the global function space is the set of 
continuous piecewise polynomials of degree q = I and q = 2 respectively. 



Automating the Finite Element Method 



13 




3.2.5 The reference finite element 

As we have seen, a global discrete function space Vh may be described by a mesh 7~, a set 
of finite elements {{K ,V k ■, J^k)} K&T and a set of local-to-global mappings {iK}KeT- We 
may simplify this description further by introducing a reference finite element {Kq,Vq,Mo), 
where A/q = . . . and a set of invertible mappings {Fk}k&t that map the 

reference cell Kq to the cells of the mesh, 

K = Fk{Ko) ^KeT, (23) 

as illustrated in Figure [2j Note that Kq is generally not part of the mesh. Typically, 
the mappings {FK}KeT a-re affine, that is, each Fk can be written in the form Fk{X) = 
AkX + bx for some matrix Ak € R'^^'^ and some vector hx G R"^, or isoparametric, in 
which case the components of Fk are functions in Vq. 

For each cell K G T, the mapping Fk generates a function space on Fk given by 

Vk = {v = voo : G Po}, (24) 

that is, each function v = v{x) may be written in the form v{x) = vq{F^^ (x)) = vooF^^{x) 
for some t^o G Pq- 

Similarly, we may also generate for each AT G T a set of nodes Mk on Vk given by 

MK = {i^f ■■iy!'iv) = u^{voFK), i = l,2,...,no}. (25) 

Using the set of mappings {Fk}k£T^ we may thus generate from the reference finite ele- 
ment {Kq^VqjMo) a set of finite elements {{K,Vk ,J^k)}k<=t given by 

K = Fk{Ko), 

Vk = {v = voo F~^ : t;o G Vq}, (26) 
Mk = {lyf : i^!^{v) = u^{v oFk), i = 1, 2, . . . , no = uk}. 

With this construction, it is also simple to generate a set of nodal basis functions {4'f}^=i on 
K from a set of nodal basis functions {^J'i}"^! on the reference element satisfying = 
6ij. Noting that if (pf = o F^^ for i = 1, 2, . . . , uk, then 

(</'f ) = i^^i^f ° Fk) = ly^^j) = d,j, (27) 
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Xi = (0,0) X2 = (l,0) 



Figure 2. The (affine) mapping Fk from a reference cell Kc, to some cell K £ T. 



SO {</'f"}"=i is a nodal basis for Vk- 

Note that not all finite elements may be generated from a reference finite element using 
this simple construction. For example, this construction fails for the family of Hermite 
finite elements [26 \ \27 \ [20]. Other examples include H(div) and H(cm\) conforming finite 
elements (preserving the divergence and the curl respectively over cell boundaries) which 
require a special mapping of the basis functions from the reference element. 

However, we shall limit our current discussion to finite elements that can be generated 
from a reference finite element according to (j26p , which includes all affine and isoparametric 
finite elements with nodes given by point evaluation such as the family of Lagrange finite 
elements on simplices. 

We may thus define a discrete function space by specifying a mesh T, a reference finite 
element {K,Vo,Afo), a set of local-to-global mappings {lk}k<^t and a set of mappings 
{Fk}k£T from the reference cell Kq, as demonstrated in Figured Note that in general, 
the mappings need not be of the same type for all cells K and not all finite elements need 
to be generated from the same reference finite element. In particular, one could employ a 
different (higher-degree) isoparametric mapping for cells on a curved boundary. 

3.3 The Variational Problem 

We shall assume that we are given a set of discrete function spaces defined by a correspond- 
ing set of finite elements on some triangulation T of a domain C M'^. In particular, we 
are given a pair of function spaces. 



Vh = span{0i},^i, 
Vh = span{0i},^i. 



(28) 



Automating the Finite Element Method 



15 




Figure 3. Piecing together local function spaces on the cells of a mesh to form 
a discrete function space on fi, generated by a reference finite element 
{KojVojJ^q), a set of local-to-global mappings {tif}/fer and a set of 
mappings {FkIxgT- 



which we refer to as the test and trial spaces respectively. 

We shall also assume that we are given a variational problem of the form: Find [/ G V/j 
such that 

a{U;v)=L{v) Vf G T4, (29) 

where a : V/i x V/i — ?> M is a semilinear form which is linear in its second argumen10 and 
L : V/i — 7> M is a linear form (functional). Typically, the forms a and L of ()29p are defined 
in terms of integrals over the domain il. or subsets of the boundary of il. 

3.3.1 Nonlinear variational problems 

The variational problem (|29|) gives rise to a system of discrete equations, 

F{U) = 0, (31) 

for the vector (^7^) G of de grees of freedom of the solution U = Ylii=i Ui(t)i G Vh, where 

Fi{U) = a{U;^i)-L{^i), i = 1, 2, . . . , TV. (32) 

It may also be desirable to compute the Jacobian A = F' oi the nonlinear system pip 
for use in a Newton's method. We note that if the semilinear form a is differentiable in U , 

■^We shall use the convention that a semilinear form is linear in each of the arguments appearing after 
the semicolon. Furthermore, if a semilinear form a with two arguments is linear in both its arguments, we 
shall use the notation 

a{v,U) = a'{U;v,U) = a'{U;v)U, (30) 

where a' is the Frechet derivative of a with respect to U , that is, we write the bilinear form with the test 
function as its first argument. 
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(36) 



then the entries of the Jacobian A are given by 

a TP (TJ\ fITT 

Aij = = Q^a{U; 4>^) = a{U- ^i)— = a'{U; ^i)(t), = a'{U; 4>i, ^j). (33) 

As an example, consider the nonUnear Poisson's equation 

-S/-{{l+u)Vu) = f inn, 

u = ondn. ^ ' 

Multiplying (j34p with a test function v and integrating by parts, we obtain 

/ Vv ■ {{l + u)Vu)dx = / vfdx, (35) 

and thus a discrete nonlinear variational problem of the form (j29p . where 

a{U;v)= [ Vv- {{l + U)VU)dx, 
Jn 

L{v) = V f dx. 
Jn 

Linearizing the semilinear form a around U, we obtain 

a'iU;v,w)= [ Vv{wVU)dx+ [ Vv ■ {{1 + U)Vw) dx, (37) 
Jn Jn 

for any w (zVh. In particular, the entries of the Jacobian matrix A are given by 

Aj = a'{U; 0„ <i)j) = /" • {(t)jVU) dx + [ ■ ((1 + C/)V0j) dx. (38) 
Jn Jn 

3.3.2 Linear variational problems 

If the variational problem (j29p is linear, the nonlinear system (|3ip is reduced to the linear 
system 

AU = b, (39) 
for the degrees of freedom (Ui) £ M^, where 

Aij = a{4)i,^j), ^^^^ 
hi = L{4)i). 

Note the relation to (j33|) in that Aij = a{(j)i,(l)j) = a'{U ; (pi, (pj). 

In Section [21 we saw the canonical example of a linear variational problem with Poisson's 
equation, 

—An = f in Q, , , 

(41) 

-u = on aO, ^ ^ 

corresponding to a discrete linear variational problem of the form (j29p . where 

a{v,U) = Vv- VUdx, 

Jn (42) 
L{v) = V f dx. 
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3.4 Multilinear Forms 

We find that for botli nonlinear and linear problems, the system of discrete equations is 
obtained from the given variational problem by evaluating a set of multilinear forms on 
the set of basis functions. Noting that the semilinear form a of the nonlinear variational 
problem ([29]) is a linear form for any given fixed U & Vh and that the form a for a linear 
variational problem can be expressed as a{v,U) = a'{U;v,U), we thus need to be able to 
evaluate the following multilinear forms: 

a{U; ■):Vh^ M, 

a'{U,;-):VhxVh^R. 
We shall therefore consider the evaluation of general multilinear forms of arity r > 0, 

a : X If X . . . X V;: ^ M, (44) 

defined on the product space Vjl xVj^ x ■ ■ ■ x Vj^ of a given set {Vj^Yj^i of discrete function 

spaces on a triangulation T of a domain Q, C M'^. In the simplest case, all function spaces 
are equal but there are many important examples, such as mixed methods, where it is 
important to consider arguments coming from different function spaces. We shall restrict 
our attention to multilinear forms expressed as integrals over the domain 0, (or subsets of 
its boundary). 

Let now {cj)j}fj^, Wjj^J^, be bases of Vjl,V^,..., respectively and let 
i = {ii,i2, ■ ■ ■ ,ir) be a multiindex of length |z| = r. The multilinear form a then defines a 
rank r tensor given by 

Ai = aicPl,cPl,...,<Pl) ViSZ, (45) 

where I is the index set 

I = fl[lAVi\] = {{l,l,...,l),{l,h---,2),---,{N\N^...,Nn}. (46) 
i=i 

For any given multilinear form of arity r, the tensor A is a (typically sparse) tensor of 
rank r and dimension (|V^i|, \V^\, \V,^\) = {N^,N^, iV^). 

Typically, the arity of the multilinear form a is r = 2, that is, a is a bilinear form, in 
which case the corresponding tensor ^4 is a matrix (the "stiffness matrix"), or the arity of 
the multilinear form a is r = 1, that is, a is a linear form, in which case the corresponding 
tensor ^ is a vector ( "the load vector" ) . 

Sometimes it may also be of interest to consider forms of higher arity. As an example, 
consider the discrete trilinear form a : Vj^ x x Vf — > M associated with the weighted 
Poisson's equation —V • {wVu) = f. The trilinear form a is given by 

a{v,U,w) = / wVv-VUdx, (47) 
Jn 

for w = X^^i Wicfif G Vf a given discrete weight function. The corresponding rank three 
tensor is given by 

A = [ cl>lV<pl.V<pldx. (48) 



(43) 
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Noting that for any w = X^^i the tensor contraction A : w = Ai^^i^i^Wi^ 

is a matrix, we may thus obtain the solution U by solving the linear system 



«1«2 



{A:w)U = b, (49) 

where bi = L{(j)j^) = J^(j)l^fdx. Of course, if the solution is needed only for one single 
weight function zu, it is more efficient to consider w as a fixed function and directly compute 
the matrix A associated with the bilinear form a{-,-,w). In some cases, it may even be 
desirable to consider the function U as being fixed and directly compute a vector A (the 
action) associated with the linear form a{-,U,w), as discussed above in Section [2.11 It is 
thus important to consider multilinear forms of general arity r. 

3.5 Assembling the Discrete System 

The standard algorithm |12H I7 H I92j for computing the tensor A is known as assembly; the 
tensor is computed by iterating over the cells of the mesh T and adding from each cell the 
local contribution to the global tensor A. 

To explain how the standard assembly algorithm applies to the computation of the 
tensor A defined in (|45p from a given multilinear form a, we note that if the multilinear 
form a is expressed as an integral over the domain Q, we can write the multilinear form as 
a sum of element multilinear forms, 



J2 (50) 

and thus 



a = 



M = J^a^«, 02^,. ..,<). (51) 

We note that in the case of Poisson's equation, — Au = /, the element bilinear form aK is 
given by axiv, U) = V-y • S/U dx. 

We now let t]^ : [l,n]^] [1,-^-'] denote the local-to-global mapping introduced above 
in Section for each discrete function space V"^, j = 1, 2, . . . , r, and define for each K & T 
the collective local-to-global mapping lk '■ ^ 2^ by 

i^K^i) = {i'K{ii),tK{i2),---,iKih)) yielK, (52) 
where is the index set 

r 

lK = ll[l,\V{\] = {{l,l,...,l),{l,l,...,2),...,{n]„nj„...,n'K)}. (53) 

Furthermore, for each we let {4>i^'''}^^i denote the restriction to an element K of the 
subset of the basis of Vj^ supported on K, and for each i G T we let 71 C T denote 

the subset of cells on which all of the basis functions {(pl Yj^-^ are supported. 

We may now compute the tensor A by summing the contributions from each local cell K, 



E/,K,1 :K,2 ,K,r 



(54) 
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This computation may be carried out efficiently by iterating once over all cells K € T and 
adding the contribution from each K to every entry Ai of A such that K € 7i, as illustrated 
in Algorithm [TJ In particular, we never need to form the set %, which is implicit through 
the set of local-to-global mappings {iK}KeT- 



Algorithm 1 A = Assemble(a, {V^}^^^, {LK}KeTj T) 

yl = 
for K eT 

for i € Xk 

end for 
end for 



The assembly algorithm may be improved by defining the element tensor A by 

= aK(<^f'\</)f',...>J'0 ^^^^K. (55) 

For any multilinear form of arity r, the element tensor A^ is a (typically dense) tensor of 
rank r and dimension {n\^,n\, . . . , nj^). 

By computing first on each cell K the element tensor A^ before adding the entries to 
the tensor A as in Algorithm [21 one may take advantage of optimized library routines for 
performing each of the two steps. Note that Algorithm [2] is independent of the algorithm 
used to compute the element tensor. 



Algorithm 2 A = Assemble(a, {iK}K(iT, T) 

for iv: E r 

Compute A^"^ according to (j55p 
Add A^ to A according to lk 
end for 



Considering first the second operation of inserting (adding) the entries of A into the 
global sparse tensor A, this may in principle be accomplished by iterating over all i (z Ik 
and adding the entry Af at position ixii) of A as illustrated in Figure HI However, sparse 
matrix libraries such as PETSc [9l [HI [10] often provide optimized routines for this type of 
operation, which may significantly improve the performance compared to accessing each 
entry of A individually as in Algorithm [TJ Even so, the cost of adding A^ to A may be 
substantial even with an efficient implementation of the sparse data structure for A, see 

m- 

A similar approach can be taken to the first step of computing the element tensor, that 
is, an optimized library routine is called to compute the element tensor. Because of the 
wide variety of multilinear forms that appear in applications, a separate implementation is 
needed for any given multilinear form. Therefore, the implementation of this code is often 
left to the user, as illustrated above in Section [2^ and Section [Z3l but the code in question 
may also be automatically generated and optimized for each given multilinear form. We 
shall return to this question below in Section [5] and Section [9) 
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4(2) 4(3) 




Figure 4. Adding the entries of the element tensor A to the global tensor A 
using the local-to-global mapping lk, illustrated here for a rank two 
tensor (a matrix). 



3.6 Summary 

If we thus view the finite element method as a machine that automates the discretization 
of differential equations, or more precisely, a machine that generates the system of discrete 
equations (j3ip from a given variational problem (|29p . an automation of the finite element 
method is straightforward up to the point of computing the element tensor for any given 
multilinear form and the local-to-global mapping for any given discrete function space; if 
the element tensor and the local-to-global mapping lk can be computed on any given 
cell K, the global tensor A may be computed by Algorithm [2j 

Assuming now that each of the discrete function spaces involved in the definition of 
the variational problem (|29|) is generated on some mesh T of the domain from some 
reference finite element {Kq,Vo,Mo) by a set of local-to-global mappings {lk^k&t and a 
set of mappings {Fk}k&t from the reference cell Kq, as discussed in Section [3]2l we identify 
the following key steps towards an automation of the finite element method: 

• the automatic and efficient tabulation of the nodal basis functions on the reference 
finite element {Kq,Vo,Nq)\ 

• the automatic and efficient evaluation of the element tensor A^ on each cell K G T; 

• the automatic and efficient assembly of the global tensor A from the set of element 
tensors {A^}KeT ^'^^d the set of local-to-global mappings {lk}k€T- 

We discuss each of these key steps below. 



Automating the Finite Element Method 



21 



4 AUTOMATING THE TABULATION OF BASIS FUNCTIONS 

Given a reference finite element (KqjVqjJVo), we wish to generate the unique nodal basis 
{*^j}r=i ^0 satisfying 

u^{<^j) = 6ij, i,j = 1,2,..., no. (56) 

In some simple cases, these nodal basis functions can be worked out analytically by hand or 
found in the literature, see for example |121t l71|. As a concrete example, consider the nodal 
basis functions in the case when Vq is the set of quadratic polynomials on the reference 
triangle Kq with vertices at = (0, 0), v'^ = (1,0) and = (0, 1) as in Figure [S] and nodes 
A/q = {i^i, ^2 , ■ ■ ■ , z^el given by point evaluation at the vertices and edge midpoints. A basis 
for Vq is then given by 

$i(X) = (1 - - X2){1 - 2Xi - 2X2), 
^2{X) = Xi{2Xi-l), 
<!>s{X) = X2{2X2-l), 

<^4{X) = 4X1X2, ^ ' 

«>5(X)= 4X2(1 -X1-X2), 
«>6(X)=4Xi(l-Xi-X2), 

and it is easy to verify that this is the nodal basis. However, in the general case, it may be 
very difficult to obtain analytical expressions for the nodal basis functions. Furthermore, 
copying the often complicated analytical expressions into a computer program is prone to 
errors and may even result in inefficient code. 

In recent work, Kirby [83 1 182 1 [8^ has proposed a solution to this problem; by expanding 
the nodal basis functions for as linear combinations of another (non-nodal) basis for Vq 
which is easy to compute, one may translate operations on the nodal basis functions, such as 
evaluation and differentiation, into linear algebra operations on the expansion coefficients. 

This new linear algebraic approach to computing and representing finite element basis 
functions removes the need for having explicit expressions for the nodal basis functions, 
thus simplifying or enabling the implementation of complicated finite elements. 

4.1 Tabulating Polynomial Spaces 

To generate the set of nodal basis functions for Vq, we must first identify some 

other known basis {^ij^Ii for Vq, referred to in j82j as the •prime basis. We return to the 
question of how to choose the prime basis below. 

Writing now each <I>j as a linear combination of the prime basis functions with a G W^^'^ 
the matrix of coefficients, we have 

no 

<^i = ^a^j^j, i = l,2,...,nQ. (58) 
The conditions thus translate into 

no 



k=l 

or 



6ij = uf{<^j) = Y,ajkJ^ii^k), i,j = l,2,...,no, (59) 
Va^ = /, (60) 
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where V € M"o^"o jg ^j^g (Vandermonde) matrix with entries Vij = and / is the 

no X 77-0 identity matrix. Thus, the nodal basis is easily computed by first computing 

the matrix V by evaluating the nodes at the prime basis functions and then solving the 
linear system ()60p to obtain the matrix a of coefficients. 

In the simplest case, the space Vq is the set Pg^Ko) of polynomials of degree < g on Kq. 
For typical reference cells, including the reference triangle and the reference tetrahedron 
shown in Figure O orthogonal prime bases are available with simple recurrence relations 
for the evaluation of the basis functions and their derivatives, see for example [33]. If 
T^o = PqiK^), it is thus straightforward to evaluate the prime basis and thus to generate 
and solve the linear system (|60p that determines the nodal basis. 

4.2 Tabulating Spaces with Constraints 

In other cases, the space Vq may be defined as some subspace of Pq^Kq), typically by 
constraining certain derivatives of the functions in Vq or the functions themselves to lie 
in Pq/(i^o) for some q' < q on some part of Kq. Examples include the the Raviart- 
Thomas Brezzi-Douglas-Fortin-Marini [23] and Arnold-Winther [1] elements, which 

put constraints on the derivatives of the functions in Vq. 

Another more obvious example, taken from [82], is the case when the functions in Vq 
are constrained to Pg_i(7o) on some part 70 of the boundary of Kq but are otherwise in 
PqlKo), which may be used to construct the function space on a p-refined cell K if the 
function space on a neighboring cell K' with common boundary 70 is only Pq-i{K'). We 
may then define the space Vq by 

Vo = {ve Pg{Ko) : v\^, G P,-i(7o)} = {ve PgiKo) : /(t;) = 0}, (61) 
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where the hnear functional I is given by integration against the qth. degree Legendre poly- 
nomial along the boundary 79. 

In general, one may define a set of linear functionals (constraints) and define Vq 

as the intersection of the null spaces of these linear functionals on Pq{KQ), 

Vo = {v e Pg{Ko) : kiv) = 0, i = l,2,...,nj. (62) 

To find a prime basis for Vq, we note that any function in Vq may be expressed as 

a linear combination of some basis functions for Pq{KQ), which we may take as 

the orthogonal basis discussed above. We find that if ^' = Yli^i^'^^^ f^i^i^ then 

1^^5(^0)1 

= /,(^')= W^j)^ i = 1,2,..., Tie, (63) 

j=i 

or 

L/3 = 0, (64) 
where L is the ric x |Pg(ii'o)| matrix with entries 

Lij = k{^j), i = l,2,...,n„ j = l,2,...,\Pg{Ko)\. (65) 

A prime basis for Vq may thus be found by computing the nullspace of the matrix L, for 
example by computing its singular value decomposition (see |58j). Having thus found the 
prime basis may proceed to compute the nodal basis as before. 

5 AUTOMATING THE COMPUTATION OF THE ELEMENT TENSOR 

As we saw in Section 13.51 given a multilinear form a defined on the product space x 
X . . . X , we need to compute for each cell K the rank r element tensor given 

by 

Af = aK{4>f,''A^/,...,ct>^:l V.gXk, (66) 

where ax is the local contribution to the multilinear form a from the cell K. 

We investigate below two very different ways to compute the element tensor, first a 
modification of the standard approach based on quadrature and then a novel approach 
based on a special tensor contraction representation of the element tensor, yielding speedups 
of several orders of magnitude in some cases. 

5.1 Evaluation by Quadrature 

The element tensor is typically evaluated by quadrature on the cell K. Many finite 
element libraries like Diffpack |24t and deal. II \12\ [THl [TT] provide the values of relevant 
quantities like basis functions and their derivatives at the quadrature points on K by 
mapping precomputed values of the corresponding basis functions on the reference cell 
Kq using the mapping Fk ■ Kq — >■ K. 

Thus, to evaluate the element tensor A^ for Poisson's equation by quadrature on K, 
one computes 

= / V<^,^'^ • V<'' dx^Yl ^fcV0j'^(x^) • V4>^/{x') det Fi,(x^), (67) 
•^^ k=i 
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for some suitable set of quadrature points {x'^y^^i C K with corresponding quadrature 

weights {wi}^2ii where we assume that the quadrature weights are scaled so that Yld^i ^« — 
\Kq\. Note that the approximation (I67|) can be made exact for a suitable choice of quadra- 
ture points if the basis functions are polynomials. 

Comparing (j67p to the example codes in Table [3] and Table HI we note the similarities 
between ([67|) and the two codes. In both cases, the gradients of the basis functions as well 
as the products of quadrature weight and the determinant of F'^ are precomputed at the 
set of quadrature points and then combined to produce the integral ([67|) . 

If we assume that the two discrete spaces and V^^ are equal, so that the local basis 

functions {(j)^ ' and {(f>^ ' are all generated from the same basis {^i}^^^^ on the 
reference cell Kq, the work involved in precomputing the gradients of the basis functions at 
the set of quadrature points amounts to computing for each quadrature point Xk and each 
basis function (pf the matrix-vector product V^'/'f (x^) = {F'j^)~~^ {xk)'Vx^i{Xk), that is. 




where x'' = Fk{X^) and (^f = ^ioFj^^. Note that the the gradients {Vx$»(Xfc)}"^'/J^^ of 
the reference element basis functions at the set of quadrature points on the reference element 
remain constant throughout the assembly process and may be pretabulated and stored. 
Thus, the gradients of the basis functions on K may be computed in NgUQCP multiply- 
add pairs (MAPs) and the total work to compute the element tensor is NgTiQcP + 
NgnQ(d + 2) NqTi^d, if we ignore that we also need to compute the mapping F^-, and the 
determinant and inverse of F'j^. In Section [5.21 and Section [7| below, we will see that this 
operation count may be significantly reduced. 

5.2 Evaluation by Tensor Representation 

It has long been known that it is sometimes possible to speed up the computation of the 
element tensor by precomputing certain integrals on the reference element. Thus, for any 
specific multilinear form, it may be possible to find quantities that can be precomputed in 
order to optimize the code for the evaluation of the element tensor. These ideas were first 
introduced in a general setting in [85l [86] and later formalized and automated in [871 EH] • 
A similar approach was implemented in early versions of DOLFIN |62 [ I68 [ [63]. but only for 
piecewise linear elements. 

We first consider the case when the mapping Fk from the reference cell is affine, and then 
discuss possible extensions to non-afhne mappings such as when Fk is the isoparametric 
mapping. As a first example, we consider again the computation of the element tensor A^ 
for Poisson's equation. As before, we have 

but instead of evaluating the gradients on K and then proceeding to evaluate the integral 
by quadrature, we make a change of variables to write 
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and thus, if the mapping Fk is afhne so that the transforms dX/dx and the determi- 
nant det-F^ are constant, we obtain 



13 (J-t-IJ JKo '-'^ai U-^a2 

= A^ -.Gk, (72) 



ai=la2=l/3=l ^ ^ ---"1 ---"2 Q,i = iQ,2=i 

or 



where 



(■ (5(1)1 

= I -—^——^dX 



(73) 



We refer to the tensor as the reference tensor and to the tensor Gk as the geometry 
tensor. 

Now, since the reference tensor is constant and does not depend on the cell K, it may 
be precomputed before the assembly of the global tensor A. For the current example, the 
work on each cell K thus involves first computing the rank two geometry tensor G^ , which 
may be done in cP' multiply-add pairs, and then computing the rank two element tensor A^ 
as the tensor contraction (|72p . which may be done in n^<i^ multiply-add pairs. Thus, the 
total operation count is d? + n^cP' ~ n^fiP^ which should be compared to NqU^d for the 
standard quadrature-based approach. The speedup in this particular case is thus roughly 
a factor Nq/d, which may be a significant speedup, in particular for higher order elements. 

As we shall see, the tensor representation ()72p generalizes to other multilinear forms as 
well. To see this, we need to make some assumptions about the structure of the multilinear 
form ()44p . We shall assume that the multilinear form a is expressed as an integral over il. of 
a weighted sum of products of basis functions or derivatives of basis functions. In particular, 
we shall assume that the element tensor A^ can be expressed as a sum, where each term 
takes the following canonical form, 

nc,(7)I?f^'^"V5l,)K-(7)]dx, (74) 

where C is some given set of multiindices, each coefficient Cj maps the multiindex 7 to a 
real number, Lj maps (^,7) to a basis function index, kj maps 7 to a component index 
(for vector or tensor valued basis functions) and 6j maps 7 to a derivative multiindex. 
To distinguish component indices from indices for basis functions, we use [•] to denote a 
component index and subscript to denote a basis function index. In the simplest case, the 
number of factors m is equal to the arity r of the multilinear form (rank of the tensor), 
but in general, the canonical form (|74p may contain factors that correspond to additional 
functions which are not arguments of the multilinear form. This is the case for the weighted 
Poisson's equation (|47p . where m = 3 and r = 2. In general, we thus have m > r. 

As an illustration of this notation, we consider again the bilinear form for Poisson's 
equation and write it in the notation of (j74p . We will also consider a more involved example 
to illustrate the generality of the notation. From (|69p . we have 

Af= / Y-^ p^dx = y / -^-^dx, (75) 

J K OX^ Jk ox^ ox^ 
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and thus, in the notation of (j74p . 

m = 2, 
C = [l,d], 
c(7) = (l,l), 

'^(7) = (0,0), 
^(t) = (7,7), 

where denotes an empty component index (the basis functions are scalar). 

As another example, we consider the bilinear form for a stabilization term appearing in 
a least-squares stabilized cG(l)cG(l) method for the incompressible Navier-Stokes equa- 
tions 



a{v, U)= f {w Vv) ■{wVU)dx= [ ^ uj[j,]^^nj[js]^^ dx, (77) 

■^'^71,72,73=1 '^^73 

where w G Vf^ = is a given approximation of the velocity, typically obtained from the 
previous iteration in an iterative method for the nonlinear Navier-Stokes equations. To 
write the element tensor for (|77p in the canonical form (|74|) . we expand w in the nodal 
basis for Vf^ = Vj^ and note that 



^f- t E E / ^^«''N«/Wd.. (78) 

71,72,73=174=175=1 

We may then write the element tensor for the bilinear form (j77p in the canonical 
form ([^, with 

m = 4, 

C = [l,df x[l,nj,]x[l,n%], 

c(7) = (l,l,w':^,w^^), (79^ 
4«i7) = (n,«2,74,75), 
^^(7) = (71,71,72,73), 
<5(7) = (72,73,0,0), 

where denotes an empty derivative multiindex (no differentiation). 

In [88], it is proved that any element tensor that can be expressed in the general 
canonical form (j74p , can be represented as a tensor contraction A^ = A^ : Gk of a reference 
tensor A9 independent of K and a geometry tensor Gk- A similar result is also presented 
in [S7] but in less formal notation. As noted above, element tensors that can be expressed 
in the general canonical form correspond to multilinear forms that can be expressed as 
integrals over of linear combinations of products of basis functions and their derivatives. 
The representation theorem reads as follows. 
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Theorem 1 (Representation theorem) If Fk is a given affine mapping from a refer- 
ence cell Kq to a cell K and {Vj^}JLi is a given set of discrete function spaces on K , 
each generated by a discrete function space Vq on the reference cell Kq through the affine 
mapping, that is, for each (f) € Vj^ there is some $ E "Pq such that $ = <^ o Fx, then the 
element tensor (I74p may be represented as the tensor contraction of a reference tensor 
and a geometry tensor Gk, 

= A^ -.Gk, (80) 

that is, 

Af = Y,AlG% yielK, (81) 

where the reference tensor A^ is independent of K. In particular, the reference tensor A" 
is given by 

^'o. = J2[ n^x"''^^i(..,/3)K-K/5)]dX, (82) 
/3GB-^^o j=i 

and the geometry tensor Gk is the outer product of the coefficients of any weight functions 
with a tensor that depends only on the Jacobian F^, 

G^=nc»detF;,j:n n ^^-^^ (^3) 

i=i p&B' r=i k=i ^■^^j'kia,P) 

for some appropriate index sets A, B and B' . We refer to the index set Ik as the set of 
primary indices, the index set A as the set of secondary indices, and to the index sets B 
and B' as sets of auxiliary indices. 

The ranks of the tensors A^ and Gk are determined by the properties of the multihnear 
form a, such as the number of coefficients and derivatives. Since the rank of the element 
tensor is equal to the arity r of the multilinear form a, the rank of the reference tensor A^ 
must be \ia\ = r + \a\, where \a\ is the rank of the geometry tensor. For the examples 
presented above, we have \ia\ = 4 and \a\ = 2 in the case of Poisson's equation and \ia\ = 8 
and |a| = 6 for the Navier-Stokes stabilization term. 

The proof of Theorem [T] is constructive and gives an algorithm for computing the rep- 
resentation (j80p . A number of concrete examples with explicit formulas for the reference 
and geometry tensors are given in Tables [T0HT3l We return to these test cases below in 
Section 19.21 when we discuss the implementation of Theorem [1] in the form compiler FFC 
and present benchmark results for the test cases. 

We remark that in general, a multilinear form will correspond to a sum of tensor con- 
tractions, rather than a single tensor contraction as in (|8Up . that is, 

A^ = ^AO'^G^,fc. (84) 

k 

One such example is the computation of the element tensor for the convection-reaction 
problem —Au + u = f, which may be computed as the sum of a tensor contraction of 
a rank four reference tensor A^'^ with a rank two geometry tensor Gk,i and a rank two 
reference tensor A^''^ with a rank zero geometry tensor Gk2- 
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a{v,U) = J^vUdx 


rank 




\ia\ = 2 




\a\ = 



Table 10. The tensor contraction representation A = A : Gk of the element 
tensor A^ for the bilinear form associated with a mass matrix (test 
case 1). 



a{v,U) = ff^Vv-VUdx 


rank 


At - Z'' Z'' dX 

lOL J Ko oXa^ dXa2 


\ia\ = 4 


r<Ot _ J7' dXa^ dXa2 


\a\ = 2 



Table 11. The tensor contraction representation A*^ — A^ : Gk of the element 
tensor A^ for the bilinear form associated with Poisson's equation (test 
case 2). 



a{v,U) = J^v ■ {wV)Udx 


rank 




\ia\ = 5 




\a\ = 3 



Table 12. The tensor contraction representation A'^ — A^' : Gk of the element 
tensor for the bilinear form associated with a linearization of the 
nonlinear term u ■ Vu in the incompressible Navier-Stokes equations 
(test case 3). 
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a{v,U) = f^e{v) : e{U)dx 


rank 


^ia — 1^13=1 Jko dXo,^ c)X„2 


\ia\ = 4 


na 1 J„+ Z7/ dXa-^ dXa2 
— 2 2^/3=1 dxfi dxp 


\a\ = 2 



Table 13. The tensor contraction representation = : Gk of the element 
tensor for the bilinear form /^^ e{v) : e(JJ) Ax = j{^v + (Vu)^) : 
(V[/ + (Vf/)^) da; associated with the strain-strain term of linear elas- 
ticity (test case 4). Note that the product expands into four terms 
which can be grouped in pairs of two. The representation is given only 
for the first of these two terms. 



5.3 Extension to Non-AfRne Mappings 

The tensor contraction representation (jSOp of Tlieorem [1] assumes tliat the mapping Fk 
from the reference cell is affine, allowing the transforms dX/dx and the determinant to be 
pulled out of the integral. To see how to extend this result to the case when the mapping Fk 
is non-affine, such as in the case of an isoparametric mapping for a higher-order element 
used to map the reference cell to a curvilinear cell on the boundary of we consider again 
the computation of the element tensor for Poisson's equation. As in Section 15. H we 
use quadrature to evaluate the integral, but take advantage of the fact that the discrete 
function spaces V]^ and V\ on K may be generated from a pair of reference finite elements 
as discussed in Section 13.21 We have 

* Jk Jk^^, dxp dxp 

- V V V / ^^^^ ^'^^ det F' dX 

d d ^1 „ 2 FIX r)Y 

-EE E-"3^(^.3)^(^.3)EiS^(^-)^(^-)deti^;.(^.3)- 

(85) 

As before, we thus obtain a representation of the form 

A^ = A^ : Gk, (86) 
where the reference tensor ^ is now given by 

^''^ = "^"^SAt^^^^^SA^^^"^^' ^^^^ 
and the geometry tensor Gk is given by 

G-K = det Fi,(A„3) X; ^(^"3)^(^"3)- 
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We thus note that a (different) tensor contraction representation of the element tensor A 
is possible even if the mapping Fx is non-affine. One may also prove a representation 
theorem similar to Theorem [T] for non-affine mappings. 

Comparing the representation (j87p - (|88p with the affine representation (j73p . we note 
that the ranks of both and Gk have increased by one. As before, we may precompute 
the reference tensor A^ but the number of multiply-add pairs to compute the element 
tensor A^ increase by a factor Nq from HqcP to Nqn^dP (if again we ignore the cost of 
computing the geometry tensor). 

We also note that the cost has increased by a factor d compared to the cost of a 
direct application of quadrature as described in Section 15.11 However, by expressing the 
element tensor A^ as a tensor contraction, the evaluation of the element tensor is more 
readily optimized than if expressed as a triply nested loop over quadrature points and basis 
functions as in Tableland Tabled! 

As demonstrated below in Section [71 it may in some cases be possible to take advantage 
of special structures such as dependencies between different entries in the tensor A^ to 
significantly reduce the operation count. Another more straightforward approach is to use 
an optimized library routine such as a BLAS call to compute the tensor contraction as we 
shall see below in Section 17.11 

5.4 A Language for Multilinear Forms 

To automate the process of evaluating the element tensor A^ ^ we must create a system that 
takes as input a multilinear form a and automatically computes the corresponding element 
tensor A^ . We do this by defining a language for multilinear forms and automatically 
translating any given string in the language to the canonical form (j74p . Prom the canonical 
form, we may then compute the element tensor A^ by the tensor contraction A^"^ = ^ : Gk- 

When designing such a language for multilinear forms, we have two things in mind. 
First, the multilinear forms specified in the language should be "close" to the corresponding 
mathematical notation (taking into consideration the obvious limitations of specifying the 
form as a string in the ASCII character set). Second, it should be straightforward to 
translate a multilinear form specified in the language to the canonical form (j74p . 

A language may be specified formally by defining a formal grammar that generates the 
language. The grammar specifies a set of rewrite rules and all strings in the language can be 
generated by repeatedly applying the rewrite rules. Thus, one may specify a language for 
multilinear forms by defining a suitable grammar (such as a standard EBNF grammar [75] ) , 
with basis functions and multiindices as the terminal symbols. One could then use an 
automating tool (a compiler-compiler) to create a compiler for multilinear forms. 

However, since a closed canonical form is available for the set of possible multilinear 
forms, we will take a more explicit approach. We fix a small set of operations, allowing only 
multilinear forms that have a corresponding canonical form (|74p to be expressed through 
these operations, and observe how the canonical form transforms under these operations. 

5.4-1 An algebra for multilinear forms 

Consider the set of local finite element spaces {Vj^-Yj^i on a cell K corresponding to a 

set of global finite element spaces {Vfi}Y=i- The set of local basis functions {4'f'^}^fjli 
span a vector space Vk and each function v in this vector space may be expressed as a 
linear combination of the basis functions, that is, the set of functions Vk may be generated 
from the basis functions through addition v + w and multiplication with scalars av. Since 
V — w = V + {—^)w and v/a = {l/a)v, we can also easily equip the vector space with 
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subtraction and division by scalars. Informally, we may thus write 

rK = {v:v = ^c^.)<t>f)}. (89) 

We next equip our vector space Vk with multiplication between elements of the vector 
space. We thus obtain an algebra (a vector space with multiplication) of linear combinations 
of products of basis functions. Finally, we extend our algebra Vk by differentiation d/dxi 
with respect to the coordinate directions on K, to obtain 



Vj,= \v:v = Y,c^.)ll^-^\, (90) 



where (•) represents some multiindex. 

To summarize, Vk is the algebra of linear combinations of products of basis functions 
or derivatives of basis functions that is generated from the set of basis functions through 
addition (+), subtraction (— ), multiplication (•), including multiplication with scalars, di- 
vision by scalars (/), and differentiation d/dxi. We note that the algebra is closed under 
these operations, that is, applying any of the operators to an element v € Vk or a pair of 
elements v,w € Vk yields a member of Vk- 

If the basis functions are vector- valued (or tensor- valued), the algebra is instead gen- 
erated from the set of scalar components of the basis functions. Furthermore, we may 
introduce linear algebra operators, such as inner products and matrix-vector products, and 
differential operators, such as the gradient, the divergence and rotation, by expressing these 
compound operators in terms of the basic operators (addition, subtraction, multiplication 
and differentiation). 

We now note that the algebra Vk corresponds precisely to the canonical form (|74p in 
that the element tensor for any multilinear form on K that can be expressed as an 
integral over K of an element v S Vk has an immediate representation as a sum of element 
tensors of the canonical form (j74p . We demonstrate this below. 

5.4-2 Examples 

As an example, consider the bilinear form 

a{v,U) = / vUdx, (91) 
with corresponding element tensor canonical form 



Af= / ^?'VJ^dx. (92) 

If we now let v = (/>^'^ € Vk and U = c/)^'^ G Vk, we note that vU G Vk and we may 
thus express the element tensor as an integral over K of an element in Vk, 

Af = [ vUdx, (93) 



JK 

which is close to the notation of ()9ip . As another example, consider the bilinear form 



a{v,U)= [ VvWU + vUdx, (94) 
Jn 
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with corresponding element tensor canonical forir(3 

As before, we let v = t/)^'"*^ G Vk and U = 0^'^ € and note that Vf • WU + vU G Vk- 
It thus follows that the element tensor for the bilinear form (|94p may be expressed as 
an integral over K of an element in Vk-, 

Af=[ VvVU + vUdx, (96) 
Jk 

which is close to the notation of (|94p . Thus, by a suitable definition of v and U as local 
basis functions on K, the canonical form (j74p for the element tensor of a given multilinear 
form may be expressed in a notation that is close to the notation for the multilinear form 
itself. 

5.4-3 Implementation by operator- overloading 

It is now straightforward to implement the algebra Vk in any object-oriented language 
with support for operator overloading, such as Python or C++. We first implement a class 
BasisFunction, representing (derivatives of) basis functions of some given finite element 
space. Each BasisFunction is associated with a particular finite element space and different 
BasisFunctions may be associated with different finite element spaces. Products of scalars 
and (derivatives of) basis functions are represented by the class Product, which may be 
implemented as a list of BasisFunctions. Sums of such products are represented by the 
class Sum, which may be implemented as a list of Products. We then define an operator 
for differentiation of basis functions and overload the operators addition, subtraction and 
multiplication, to generate the algebra of BasisFunctions, Products and Sums, and note 
that any combination of such operators and objects ultimately yields an object of class Sum. 
In particular, any object of class BasisFunction or Product may be cast to an object of 
class Sum. 

By associating with each object one or more indices, implemented by a class Index, 
an object of class Product automatically represents a tensor expressed in the canonical 
form ()74p . Finally, we note that we may introduce compound operators such as grad, div, 
rot, dot etc. by expressing these operators in terms of the basic operators. 

Thus, if V and U are objects of class BasisFunction, the integrand of the bilinear 
form (j94p may be given as the string 

dot(grad(v), grad(U)) + v*U. (97) 

In Table we saw a similar example of how the bilinear form for Poisson's equation is 
specified in the language of the FEniCS Form Compiler FFC. Further examples will be 
given below in Section [9.21 and Section [TOl 

6 AUTOMATING THE ASSEMBLY OF THE DISCRETE SYSTEM 

In Section O we reduced the task of automatically generating the discrete system F{U) = 
for a given nonlinear variational problem a{U ; v) = L{v) to the automatic assembly of the 



■^To be precise, the element tensor is the sum of two element tensors, each written in the canonical 
form (|74|l with a suitable definition of multiindices t, k, and S. 



Automating the Finite Element Method 



33 



tensor A that represents a given multilinear form a in a given finite element basis. By 
Algorithm [2l this process may be automated by automating first the computation of the 
element tensor , which we discussed in the previous section, and then automating the 
addition of the element tensor A^ into the global tensor A, which is the topic of the current 
section. 

6.1 Implementing the Local-to-Global Mapping 

With the local-to-global mappings for a set of discrete function spaces, {V^}j=i, 

we evaluate for each j the local-to-global mapping on the set of local node numbers 
{1,2,..., n]^}, thus obtaining for each j a tuple 

= (4(i)>4(2),---,4K))- (98) 

The entries of the element tensor A^ may then be added to the global tensor A by an 
optimized low-level library cal0 that takes as input the two tensors A and A^ and the set 
of tuples (arrays) that determine how each dimension of A^ should be distributed onto the 
global tensor A. Compare Figure H] with the two tuples given by i]^{2), l]^{3)) and 

(i|.(l), i|.(2), ^2^(3)) respectively. 

Now, to compute the set of tuples ?T']^])}j=i, we may consider implementing for 

each j a function that takes as input the current cell K and returns the corresponding tuple 
^■^([1, uk])- Since the local-to-global mapping may look very different for different function 
spaces, in particular for different degree Lagrange elements, a different implementation is 
needed for each different function space. Another option is to implement a general purpose 
function that handles a range of function spaces, but this quickly becomes inefficient. Prom 
the example implementations given in Table [TH and Table [TJ] for continuous linear and 
quadratic Lagrange finite elements on tetrahedra, it is further clear that if the local-to-global 
mappings are implemented individually for each different function space, the mappings can 
be implemented very efficiently, with minimal need for arithmetic or branching. 



void nodemapCint nodes [] , const Cell& cell, const Meshfe mesh) 
{ 

nodes [0] = cell . vertexID(O) ; 

nodes[l] = cell . vertexID(l) ; 

nodes [2] = cell . vertexID(2) ; 

nodes [3] = cell . vertexID(3) ; 

> 



Table 14. A CH — h implementation of the mapping from local to global node num- 
bers for continuous linear Lagrange finite elements on tetrahedra. One 
node is associated with each vertex of a local cell and the local node 
number for each of the four nodes is mapped to the global number of 
the associated vertex. 



''if PETSc [9] |8] [To] is used as the linear algebra backend, such a library call is available with the call 
VecSet Values () for a rank one tensor (a vector) and MatSetValues () for a rank two tensor (a matrix). 
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void nodemap ( int nodes [] , const Cell& cell, const Meshfe mesh) 
{ 

cell . vertexID(O) ; 
cell . vertexID(l) ; 
cell . vertexID(2) ; 
cell . vertexID(3) ; 
= mesh.numVerticesO ; 
offset + cell.edgelD(O) 
cell.edgelD(l) 
cell.edgeID(2) 
cell.edgelDO) 
cell.edgeID(4) 
cell.edgeID(5) 



nodes [0] = 
nodes [1] = 
nodes [2] = 
nodes [3] = 
int offset 
nodes [4] 
nodes [5] 
nodes [6] 
nodes [7] 
nodes [8] 
nodes [9] 



offset 
offset 
offset 
offset 
offset 



Table 15. A CH — h implementation of the mapping from local to global node num- 
bers for continuous quadratic Lagrange finite elements on tetrahedra. 
One node is associated with each vertex and also each edge of a local 
cell. As for linear Lagrange elements, local vertex nodes are mapped to 
the global number of the associated vertex, and the remaining six edge 
nodes are given global numbers by adding to the global edge number 
an offset given by the total number of vertices in the mesh. 



6.2 Generating the Local-to-Global Mapping 

We thus seek a way to automatically generate the code for the local-to-global mapping 
from a simple description of the distribution of nodes on the mesh. As before, we restrict 
our attention to elements with nodes given by point evaluation. In that case, each node 
can be associated with a geometric entity, such as a vertex, an edge, a face or a cell. 
More generally, we may order the geometric entities by their topological dimension to 
make the description independent of dimension-specific notation (compare [80]): for a two- 
dimensional triangular mesh, we may refer to a (topologically two-dimensional) triangle as 
a cell, whereas for a three-dimensional tetrahedral mesh, we would refer to a (topologically 
two-dimensional) triangle as a face. We may thus for each topological dimension list the 
nodes associated with the geometric entities within that dimension. More specifically, we 
may list for each topological dimension and each geometric entity within that dimension a 
tuple of nodes associated with that geometric entity. This approach is used by the Finite 
element Automatic Tabulator FIAT [83l[82l[8i] . 

As an example, consider the local-to-global mapping for the linear tetrahedral element 
of Table [TH Each cell has four nodes, one associated with each vertex. We may then 
describe the nodes by specifying for each geometric entity of dimension zero (the vertices) 
a tuple containing one local node number, as demonstrated in Table [TBI Note that we may 
specify the nodes for a discontinuous Lagrange finite element on a tetrahedron similarly by 
associating all for nodes with topological dimension three, that is, with the cell itself, so 
that no nodes are shared between neighboring cells. 

As a further illustration, we may describe the nodes for the quadratic tetrahedral ele- 
ment of Table [15] by associating the first four nodes with topological dimension zero (ver- 
tices) and the remaining six nodes with topological dimension one (edges), as demonstrated 
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in Table dZl 

Finally, we present in Table [TH] the specification of the nodes for fifth-degree Lagrange 
finite elements on tetrahedra. Since there are now multiple nodes associated with some 
entities, the ordering of nodes becomes important. In particular, two neighboring tetrahedra 
sharing a common edge (face) must agree on the global node numbering of edge (face) 
nodes. This can be accomplished by checking the orientation of geometric entities with 
respect to some given conventionj^] For each edge, there are two possible orientations and 
for each face of a tetrahedron, there are six possible orientations. In Table UM. we present 
the local-to- global mapping for continuous fifth-degree Lagrange finite elements, generated 
automatically from the description of Table [TH] by the FEniCS Form Compiler FFC [981 [871 

MM- 



d = (1) - (2) - (3) - (4) 



Table 16. Specifying the nodes for continuous linear Lagrange finite elements on 
tetrahedra. 



d = 


(1) - (2) - (3) - (4) 


d = l 


(5) - (6) - (7) - (8) - (9) - (10) 



Table 17. Specifying the nodes for continuous quadratic Lagrange finite elements 
on tetrahedra. 



We may thus think of the local-to-global mapping as a function that takes as input the 
current cell K (cell) together with the mesh T (mesh) and generates a tuple (nodes) that 
maps the local node numbers on K to global node numbers. For finite elements with nodes 
given by point evaluation, we may similarly generate a function that interpolates any given 
function to the current cell K by evaluating it at the nodes. 



d = 


(1) - (2) - (3) - (4) 


d = l 


(5, 6, 7, 8) - (9, 10, 11, 12) - (13, 14, 15, 16) - 
(17, 18, 19, 20) - (21, 22, 23, 24) - (25, 26, 27, 28) 


d = 2 


(29,30,31,32,33,34) - (35,36,37,38,39,40) - 
(41, 42, 43, 44, 45, 46) - (47, 48, 49, 50, 51, 52) 


d = 3 


(53,54,55,56) 



Table 18. Specifying the nodes for continuous fifth-degree Lagrange finite ele- 
ments on tetrahedra. 



7 OPTIMIZATIONS 

As we saw in Section [5l the (affine) tensor contraction representation of the element tensor 
for Poisson's equation may significantly reduce the operation count in the computation of 
the element tensor. This is true for a wide range of multilinear forms, in particular test 
cases 1-4 presented in Tables [T0HT3l 



^For an example of such a convention, see [63] or [99| . 
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void nodemapCint nodes[], const Cellfe cell, const Meshfe mesh) 
{ 

static unsigned int edge_reordering [2] [4] = {{0, 1, 2, 3}, {3, 2, 1, O}}; 

static unsigned int f ace_reordering [6] [6] = {{0, 1, 2, 3, 4, 5}-, 

{0, 3, 5, 1, 4, 2>, 
{5, 3, 0, 4, 1, 2>, 
{2, 1, 0, 4, 3, 5}, 
{2, 4, 5, 1, 3, 0}, 
{5, 4, 2, 3, 1, 0}}; 

nodes [0] = cell . vertexID(O) ; 

nodes[l] = cell . vertexID(l) ; 

nodes [2] = cell . vertexID(2) ; 

nodes [3] = cell . vertexID(3) ; 

int alignment = cell . edgeAlignment (0) ; 

int offset = mesh.numVerticesO ; 

nodes[4] = offset + 4*cell . edgelD(O) + edge_reordering [alignment] [0] ; 
nodes[5] = offset + 4*cell . edgelD(O) + edge_reordering [alignment] [1] ; 
nodes[6] = offset + 4*cell . edgelD(O) + edge_reordering [alignment] [2] ; 
nodes[7] = offset + 4*cell . edgelD(O) + edge_reordering [alignment] [3] ; 
alignment = cell . edgeAlignment (1) ; 

nodes[8] = offset + 4*cell . edgelD(l) + edge_reordering [alignment] [0] ; 
nodes[9] = offset + 4*cell . edgelD(l) + edge_reordering [alignment] [1] ; 
nodes[10] = offset + 4*cell . edgelD(l) + edge_reordering [alignment] [2] ; 
nodes[ll] = offset + 4*cell . edgelD (1) + edge_reordering [alignment] [3] ; 

alignment = cell . f aceAlignment (0) ; 
offset = offset + 4*mesh.numEdges() ; 

nodes[28] = offset + 6*cell.f acelD(O) + face_reordering [alignment] [0] 
nodes[29] = offset + 6*cell.f acelD(O) + face_reordering [alignment] [1] 
nodes[30] = offset + 6*cell.f acelD(O) + face_reordering [alignment] [2] 
nodes[31] = offset + 6*cell.f acelD(O) + face_reordering [alignment] [3] 
nodes[32] = offset + 6*cell.f acelD(O) + face_reordering [alignment] [4] 
nodes[33] = offset + 6*cell.faceID(0) + face_reordering [alignment] [5] 

offset = offset + 6*mesh.numFaces() ; 
nodes[52] = offset + 4*cell.id() + 0; 
nodes [53] = offset + 4*cell.id() + 1; 
nodes[54] = offset + 4*cell.id() + 2; 
nodes[55] = offset + 4*cell.id() + 3; 

} 



Table 19. A C++ implementation (excerpt) of the mapping horn local to global 
node numbers for continuous fifth-degree Lagrange finite elements on 
tetrahedra. One node is associated with each vertex, four nodes with 
each edge, six nodes with each face and four nodes with the tetrahedron 
itself. 
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In some cases however, it may be more efficient to compute the element tensor by 
quadrature, either using the direct approach of Section 15.11 or by a tensor contraction 
representation of the quadrature evaluation as in Section 15.31 Which approach is more 
efficient depends on the multilinear form and the function spaces on which it is defined. In 
particular, the relative efficiency of a quadrature-based approach increases as the number 
of coefficients in the multilinear form increases, since then the rank of the reference tensor 
increases. On the other hand, the relative efficiency of the (affine) tensor contraction 
representation increases when the polynomial degree of the basis functions and thus the 
number of quadrature points increases. See [HTj for a more detailed account. 

7.1 Tensor Contractions as Matrix Vector Products 

As demonstrated above, the representation of the element tensor A^"^ as a tensor contrac- 
tion may be generated automatically from a given multilinear form. To 
evaluate the element tensor A^ ^ it thus remains to evaluate the tensor contraction. A sim- 
ple approach would be to iterate over the entries {Af}i^xK and for each entry Af 
compute the value of the entry by summing over the set of secondary indices as outlined in 
Algorithm [3l 



Algorithm 3 A^"" = ComputeElementTensor() 

for i € Ik 

Af = 

for a e A 

Af = Af + AlG% 

end for 
end for 



Examining Algorithm [3l we note that by an appropriate ordering of the entries in A^ , 
A'^ and Gk, one may rephrase the tensor contraction as a matrix-vector product and call 
an optimized library routing for the computation of the matrix-vector product. 

■ IX I 

To see how to write the tensor contraction as a matrix-vector product, we let be 

an enumeration of the set of primary multiindices Ik and let be an enumeration of 

the set of secondary multiindices A. As an example, for the computation of the 6x6 element 
tensor for Poisson's equation with quadratic elements on triangles, we may enumerate the 
primary and secondary multiindices by 

{f}f-\ = {(1, 1), (1, 2), . . . , (1, 6), (2, 1), . . . , (6, 6)}, ^^^^ 
W}l='i = {(l,l),(l,2),(2,l),(2,2)}. 

By similarly enumerating the 36 entries of the 6x6 element tensor A^ and the four entries 
of the 2x2 geometry tensor Gk, one may define two vectors G M'^^ and gK G 
corresponding to the two tensors A^ and Gk respectively. 

In general, the element tensor A^ and the geometry tensor Gk may thus be flattened 



"^Such a library call is available with the standard level 2 BLAS [18] routine DGEMV, with optimized 
implementations provided for different architectures by ATLAS |109l 1119] 1120) . 
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to create the corresponding vectors a -H- A and gx •H- Gk, defined by 

„K - (aK aK aK ^T 

qk — yy'K 1 1 • • • ) '^K I 

Similarly, we define the x \ J\ matrix ]^ by 

A% = Al^^, j = 1,2,..., \1kI fc = l,2,...,|^|. (101) 

Since now 

1^1 Ml 

4 = = E ^Ic.G'k = E ^l^^Gi = AU9k)u. (102) 

a&A k=l k=l 

it follows that the tensor contraction A^ = AP : Gk corresponds to the matrix-vector 
product 

= iV- (103) 

As noted earlier, the element tensor A^ may generally be expressed as a sum of tensor 
contractions, rather than as a single tensor contraction, that is, 

A^ = YA'^':GK,k. (104) 

k 

In that case, we may still compute the (flattened) element tensor A^ by a single matrix- 
vector product. 



= ^V- (105) 



Having thus phrased the general tensor contraction (1104p as a matrix- vector product, 
we note that by grouping the cells of the mesh T into subsets, one may compute the set of 
element tensors for all cells in a subset by one matrix-matrix product (corresponding to a 
level 3 BLAS call) instead of by a sequence of matrix-vector products (each corresponding 
to a level 2 BLAS call), which will typically lead to improved floating-point performance. 
This is possible since the (flattened) reference tensor A^ remains constant over the mesh. 
Thus, if {Kk}k C T is a subset of the cells in the mesh, we have 

[a^i a^^ ...] = [A^gK, A^gx, . . .] = i° [gK, 9k, • • •] • (106) 

The optimal size of each subset is problem and architecture dependent. Since the geometry 
tensor may sometimes contain a large number of entries, the size of the subset may be 
limited by the available memory. 

7.2 Finding an Optimized Computation 

Although the techniques discussed in the previous section may often lead to good floating- 
point performance, they do not take full advantage of the fact that the reference tensor is 
generated automatically. In |85] and later in |89j . it was noted that by knowing the size 
and structure of the reference tensor at compile-time, one may generate very efficient code 
for the computation of the reference tensor. 



(100) 
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Letting gx S ' be the vector obtained by flattening the geometry tensor Gk as 
above, we note that each entry Af^ of the element tensor is given by the inner product 

Af = a'l-gK, (107) 

where is the vector defined by 



a. 



'-iAluAl.,...,Al,,y. (108) 



To optimize the evaluation of the element tensor, we look for dependencies between the 
vectors {a^jigx^ and use the dependencies to reduce the operation count. There are many 
such dependencies to explore. Below, we consider collinearity and closeness in Hamming 
distance between pairs of vectors a? and a^,. 

7.2.1 Collinearity 

We first consider the case when two vectors and a^i are collinear, that is, 

a°, = aa°, (109) 

for some nonzero a G M. If and a^, are collinear, it follows that 

Af = o°, • QK = (aa") • QK = aAf. (110) 

We may thus compute the entry A^ in a single multiplication, if the entry Af has already 
been computed. 

7.2.2 Closeness in Hamming distance 

Another possibility is to look for closeness between pairs of vectors a? and a?, in Hamming 
distance (see [29]), which is defined as the number entries in which two vectors differ. If 
the Hamming distance between and a^, is p, then the entry A^, may be computed from 
the entry A? in at most p multiply-add pairs. To see this, we assume that and a^/ differ 
only in the first p entries. It then follows that 

A^ = 4 .g^ = 4.g^ + ^(4^, - Al,)Gi =Af + i2i4a^ " 

fe=l k=l 

where we note that the vector (^,^1 - A'^^i,A'^,^2 - -4°^2, ■ ■ ■ ,-4°,^p - A^^p^ may be pre- 
computed at compile-time. We note that the maximum Hamming distance between 
and a° is p = \A\, that is, the length of the vectors, which is also the cost for the direct 
computation of an entry by the inner product p07p . We also note that if = a^, and 
consequently Af = A^ , then the Hamming distance and the cost of obtaining A^ from 
Af are both zero. 

7.2.3 Complexity-reducing relations 

In |89| . dependencies between pairs of vectors, such as collinearity and closeness in Hamming 
distance, that can be used to reduce the operation count in computing one entry from 
another, are referred to as complexity-reducing relations. In general, one may define for any 
pair of vectors and a^, the complexity-reducing relation p{a^,a^,) < \A\ as the minimum 
of all complexity complexity reducing relations found between and a?, . Thus, if we look 
for collinearity and closeness in Hamming distance, we may say that p{a'^, a^,) is in general 
given by the Hamming distance between and o?, unless the two vectors are collinear, in 
which case p{a^,aP^,) < 1. 
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7.2.4 Finding a minimum spanning tree 

Given the set of vectors {a^jiei^ and a complexity-reducing relation p, the problem is now 
to find an optimized computation of the element tensor by systematically exploring 
the complexity-reducing relation p. In [89], it was found that this problem has a simple 
solution. By constructing a weighted undirected graph G = {V, E) with vertices given by 
the vectors {a^jjgx^ and the weight at each edge given by the value of the complexity- 
reducing relation p evaluated at the pair of end-points, one may find an optimized (but not 
necessarily optimal) evaluation of the element tensor by computing the minimum spanning 
treejH G' = {V, E') for the graph G. 

The minimum spanning tree directly provides an algorithm for the evaluation of the 
element tensor A^ . If one first computes the entry of A^ corresponding to the root vertex 
of the minimum spanning tree, which may be done in \ A\ multiply-add pairs, the remaining 
entries may then be computed by traversing the tree (following the edges), either breadth- 
first or depth- first, and at each vertex computing the corresponding entry of A^ from the 
parent vertex at a cost given by the weight of the connecting edge. The total cost of 
computing the element tensor A^ is thus given by 

l^l + l^'l, (112) 

where \E'\ denotes the weight of the minimum spanning tree. As we shall see, computing 
the minimum spanning tree may significantly reduce the operation count, compared to the 
straightforward approach of Algorithm [3] for which the operation count is given by \Tk\ \ A\. 

7.2.5 A concrete example 

To demonstrate these ideas, we compute the minimum spanning tree for the computation 
of the 36 entries of the 6x6 element tensor for Poisson's equation with quadratic elements 
on triangles and obtain a reduction in the operation count from from a total of \Xk \ \ A\ = 
36 X 4 = 144 multiply-add pairs to less than 17 multiply-add pairs. Since there are 36 
entries in the element tensor, this means that we are be able to compute the element tensor 
in less than one operation per entry (ignoring the cost of computing the geometry tensor). 
As we saw above in Section [5.21 the rank four reference tensor is A^ is given by 

= / TT^TTV^ Vi elK Va € A, (113) 

where now Xk = [1,6]^ and A = [1,2]^. To compute the 6^ x 2^ = 144 entries of the 
reference tensor, we evaluate the set of integrals (|113|) for the basis defined in (|57|). In 
Table [20l we give the corresponding set of (scaled) vectors {a^jjgx^ displayed as a 6 x 6 
matrix of vectors with rows corresponding to the first component ii of the multiindex i 
and columns corresponding to the second component 12 of the multiindex i. Note that the 
entries in Table [20] have been scaled with a factor 6 for ease of notation (corresponding to 
the bilinear form a{v,U) = 6 J^Vv ■ VC/dx). Thus, the entries of the reference tensor are 
given by A^,^^^ = A^^^,^ = A^^,^^ = ^?i22 = 3/6 = 1/2, ^?2ii = V6, ^?2i2 = 0, etc. 



''A spanning tree for a graph G — {V,E) is any connected acyclic subgraph {V,E') of {V,E), that is, 
each vertex in V is connected to an edge m E' <Z E and there are no cycles. The (generally non-unique) 
mimmum spanning tree of a weighted graph G is a spanning tree G' — {V, E') that minimizes the sum 
of edge weights for E' . The minimum spanning tree may be computed using standard algorithms such as 
Kruskal's and Prim's algorithms, see |29) . 
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1 2 3 4 5 6 



1 

i 


[6, 6, 6, 6) 


[i, U, i, Uj 


[U, i, U, L) 


[U, U, U, Uj 


— {U, 4, U, 4j 


(AHA 
— (_4, U, 4, Uj 


2 


(1,1,0,0)T 


(3,0,0,0)^ 


-(0,1,0,0)^, 


(0,4,0,0)^ 


(0,0,0,0)^ 


-(4,4,0,0)T 


3 


(0,0,1,1)^ 


-(0,0,1,0)^ 


(0,0,0,3)^ 


(0,0,4,0)^ 


-(0,0,4,4)^ 


(0,0,0,0)T 


4 


(0,0,0,0)T 


(0,0,4,0)^ 


(0,4,0,0)^ 


(8,4,4,8)^ 


-(8,4,4,0)^ 


-(0,4,4,8)T 


5 


-(0,0,4,4)T 


(0,0,0,0)^ 


-(0,4,0, 4)T 


-(8,4,4,0)^ 


(8,4,4,8)^ 


(0,4,4,0)^ 


6 


-(4,4,0,0)^ 


-(4,0,4,0)^ 


(0,0,0,0)^ 


-(0,4,4,8)^ 


(0,4,4,0)^ 


(8,4,4,8)^ 



Table 20. The 6x6x2x2 reference tensor A" for Poisson's equation with 
quadratic elements on triangles, displayed here as the set of vec- 
tors {ttijieiK- 



Before proceeding to compute the minimum spanning tree for the 36 vectors in Ta- 
ble [20l we note that the element tensor for Poisson's equation is symmetric, and as 
a consequence we only need to compute 21 of the 36 entries of the element tensor. The 
remaining 15 entries are given by symmetry. Furthermore, since the geometry tensor Gk 
is symmetric (see Table [TT|) . it follows that 



where 



= "^aiGR + (^^12 + ^i2i)G'x + ^i22GK -dK: 

^0 _ ( aO aO , aO aO \T 

t: _ /'/-ill ^^12 rv22NT 
yK — {^K ) ) I ■ 



(114) 



(115) 



As a consequence, each of the 36 entries of the element tensor may be obtained in at 
most 3 multiply-add pairs, and since only 21 of the entries need to be computed, the total 
operation count is directly reduced from 144 to 21 x 3 = 63. 

The set of symmetry-reduced vectors {a^i, 0^2, • • • , agg} are given in Table [211 We 
immediately note a number of complexity-reducing relations between the vectors. Entries 
a?2 = (1,1,0)"^, a?6 = (-4, -4,0)'^,a^6 = (-4,-4,0)"^ and a^g = (-8, -8, 0)"^ are collinear, 
entries 044 = (8, 8, 8)^ and 045 = (—8, —8, 0)^ are close in Hamming distanc^ etc. 

To systematically explore these dependencies, we form a weighted graph G = (V, E) and 
compute a minimum spanning tree. We let the vertices V be the set of symmetry-reduced 
vectors, V = {a^i, 0^2, • • • , Oeel) form the set of edges E by adding between each pair 
of vertices an edge with weight given by the minimum of all complexity-reducing relations 
between the two vertices. The resulting minimum spanning tree is shown in Figure [H We 
note that the total edge weight of the minimum spanning tree is 14. This means that once 



We use an extended concept of Hamming distance by allowing an optional negation of vectors (which 
is cheap to compute). 
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1 2 3 4 5 6 



1 


(3,6,3)^ 


(1,1,0)^ 


(0,1,1)^ 


(0,0, 0)T 


-(0,4,4)T 


-(4,4,0)T 


2 




(3,0,0)T 


-(0,1,0)T, 


(0,4,0)T 


(0,0, 0)T 


-(4,4,0)T 


3 






(0,0, 3)T 


(0,4,0)T 


-(0,4,4)T 


(0,0, 0)T 


4 








(8,8,8)^ 


-(8,8,0)T 


-(0,8,8)^ 


5 










(8,8,8)^ 


(0,8,0)^ 


6 












(8,8,8)^ 



Table 21. The upper triangular part of the symmetry-reduced reference tensor A' 
for Poisson's equation with quadratic elements on triangles. 



the value of the entry in the element tensor corresponding to the root vertex is known, 
the remaining entries may be computed in at most 14 multiply-add pairs. Adding the 3 
multiply-add pairs needed to compute the root entry, we thus find that all 36 entries of the 
element tensor may be computed in at most 17 multiply-add pairs. 

An optimized algorithm for the computation of the element tensor may then be 
found by starting at the root vertex and computing the remaining entries by traversing the 
minimum spanning tree, as demonstrated in Algorithm HI Note that there are several ways 
to traverse the tree. In particular, it is possible to pick any vertex as the root vertex and 
start from there. Furthermore, there are many ways to traverse the tree given the root 
vertex. Algorithm U] is generated by traversing the tree breadth-first, starting at the root 
vertex 044 = (8,8,8)^. Finally, we note that the operation count may be further reduced 
by not counting multiplications with zeros and ones. 



Algorithm 4 An optimized (but not optimal) algorithm for computing the upper triangular 
part of the element tensor A^ for Poisson's equation with quadratic elements on triangles 



in 17 multiply-add pairs. 



^44 


— Al^iiG 


K + (^4412 " 


1" ^4421 " 


I 40 (-122 
r ^4422 "-^/f 


^13 


— ~^23 + 1^*22 




- A^ 4. 

— ^44 ^ 








A'^ 
^14 


= 0Af3 






- A^ 4- 

— ^44 ^ 


-8Gf 






A^^ 
^34 


- A^^ 

— ^24 






- A^^ 

— ^44 








A^^ 

^15 


= -4.4 


K 
13 


^66 


- A^^ 








^25 


- 4-^ 


A^ 


- A^ 

— ^45 


' 8G)^ 






4-^ 

^22 


- 4-^ 


f 3Gfi 


A^ 


- 8^45 








4-^ 

^33 


- 4-^ 

— ^14 


f 3G22 


A^ 

^16 


- 1 A^ 

— 2 "^45 








A^ 
^36 


— ^14 




A^ 

^23 


= -A^2 4 








A^ 
^35 


- A^^ 

— ^15 




^24 


- A^ 

— ^16 


4G]^ 






A^^ 
^11 




f 6Gf2 + 3G2^2 


A^ 
^26 
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Figure 6. The minimum spanning tree for the optimized computation of the upper 
triangular part (Table I21|l of the element tensor for Poisson's equation 
with quadratic elements on triangles. Solid (blue) lines indicate zero 
Hamming distance (equality), dashed (blue) lines indicate a small but 
nonzero Hamming distance and dotted (red) lines indicate coUinearity. 
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7.2.6 Extensions 

By use of symmetry and relations between subsets of the reference tensor we have seen 
that it is possible to significantly reduce the operation count for the computation of the 
tensor contraction = AP : Gk ■ We have here only discussed the use of binary relations 
(collinearity and Hamming distance) but further reductions may be made by considering 
ternary relations, such as coplanarity, and higher-arity relations between the vectors. 

8 AUTOMATION AND SOFTWARE ENGINEERING 



In this section, we comment briefly on some topics of software engineering relevant to the 
automation of the finite element method. A number of books and papers have been written 
on the subject of software engineering for the implementation of finite element methods, 
see for example [92^ El I1U4^ llU5j . In particular these works point out the importance of 
object-oriented, or concept-oriented, design in developing mathematical software; since the 
mathematical concepts have already been hammered out, it may be advantageous to reuse 
these concepts in the system design, thus providing abstractions for important concepts, 
including Vector, Matrix, Mesh, Function, BilinearForm, LinearForm, FiniteElement 
etc. 

We shall not repeat these arguments, but instead point out a couple of issues that might 
be less obvious. In particular, a straightforward implementation of all the mathematical 
concepts discussed in the previous sections may be difficult or even impossible to attain. 
Therefore, we will argue that a level of automation is needed also in the implementation or 
realization of an automation of the finite element method, that is, the automatic generation 
of computer code for the specific mathematical concepts involved in the specification of any 
particular finite element method and differential equation, as illustrated in Figure [71 




Figure 7. A machine (computer program) that automates the finite element 
method by automatically generating a particular machine (computer 
program) for a suitable subset of the given input data. 



We also point out that the automation of the finite element method is not only a software 
engineering problem. In addition to identifying and implementing the proper mathematical 
concepts, one must develop new mathematical tools and ideas that make it possible for the 
automating system to realize the full generality of the finite element method. In addition, 
new insights are needed to build an efficient automating system that can compete with or 
outperform hand-coded specialized systems for any given input. 
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8.1 Code Generation 

As in all types of engineering, software for scientific computing must try to find a suitable 
trade-off between generality and efficiency; a software system that is general in nature, that 
is, it accepts a wide range of inputs, is often less efficient than another software system that 
performs the same job on a more limited set of inputs. As a result, most codes used by 
practitioners for the solution of differential equations are very specific, often specialized to 
a specific method for a specific differential equation. 

However, by using a compiler approach, it is possible to combine generality and efficiency 
without loss of generality and without loss of efficiency. Instead of developing a potentially 
inefficient general-purpose program that accepts a wide range of inputs, a suitable subset of 
the input is given to an optimizing compiler that generates a specialized program that takes 
a more limited set of inputs. In particular, one may automatically generate a specialized 
simulation code for any given method and differential equation. 

An important task is to identify a suitable subset of the input to move to a precompi- 
lation phase. In the case of a system automating the solution of differential equations by 
the finite element method, a suitable subset of input includes the variational problem (j29|) 
and the choice of approximating finite element spaces. We thus develop a domain-specific 
compiler that accepts as input a variational problem and a set of finite elements and gen- 
erates optimized low-level code (in some general-purpose language such as C or C-I--I-). 
Since the compiler may thus work on only a small family of inputs (multilinear forms), 
domain-specific knowledge allows the compiler to generate very efficient code, using the 
optimizations discussed in the previous section. We return to this in more detail below in 
Section [9.21 when we discuss the FEniCS Form Compiler FFC. 

We note that to limit the complexity of the automating system, it is important to 
identify a minimal set of code to be generated at a precompilation stage, and implement 
the remaining code in a general-purpose language. It makes less sense to generate the code 
for administrative tasks such as reading and writing data to file, special algorithms like 
adaptive mesh refinement etc. These tasks can be implemented as a library in a general- 
purpose language. 

8.2 Just-In-Time Compilation 

To make an automating system for the solution of differential equations truly useful, the 
generation and precompilation of code according to the above discussion must also be 
automated. Thus, a user should ultimately be presented with a single user- interface and 
the code should automatically and transparently be generated and compiled just-in-time 
for a given problem specification. 

Achieving just-in-time compilation of variational problems is challenging, not only to 
construct the exact mechanism by which code is generated, compiled and linked back in at 
run-time, but also to reduce the precompilation phase to a minimum so that the overhead 
of code generation and compilation is acceptable. To compile and generate the code for the 
evaluation of a multilinear form as discussed in Section [5l we need to compute the tensor 
representation ()80p . including the evaluation of the reference tensor. Even with an optimized 
algorithm for the computation of the reference tensor as discussed in [88], the computation of 
the reference tensor may be very costly, especially for high-order elements and complicated 
forms. To improve the situation, one may consider caching previously computed reference 
tensors (similarly to how I^T^;i]X generates and caches fonts in different resolutions) and 
reuse previously computed reference tensors. As discussed in [88], a reference tensor may 
be uniquely identified by a (short) string referred to as a signature. Thus, one may store 
reference tensors along with their signatures to speed up the precomputation and allow 
run-time just-in-time compilation of variational problems with little overhead. 
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9 A PROTOTYPE IMPLEMENTATION (FEniCS) 

An algorithm must be seen to be believed, and the best way 
to learn what an algorithm is all about is to try it. 

Donald E. Knuth 
The Art of Computer Programming (1968) 

The automation of the finite element method includes its own realization, that is, a 
software system that implements the algorithms discussed in Sections [SHEl Such a system 
is provided by the FEniCS project [6^ I36j. We present below some of the key components 
of FEniCS, including FIAT, FFC and DOLFIN, and point out how they relate to the 
various aspects of the automation of the finite element method. In particular, the automatic 
tabulation of finite element basis functions discussed in Section [J] is provided by FIAT \83\ 
[821 , the automatic evaluation of the element tensor as discussed in Section [5] is provided 
by FFC [Ml EH ISSl 122] and the automatic assembly of the discrete system as discussed 
in Section [6] is provided by DOLFIN [621 [681 ES]- The FEniCS project thus serves as a 
testbed for development of new ideas for the automatic and efficient implementation of 
finite element methods. At the same, it provides a reference implementation of these ideas. 

FEniCS software is free software [55]. In particular, the components of FEniCS are 
licensed under the GNU General Public License [S3]l£l The source code is freely available 
on the FEniCS web site [60] and the development is discussed openly on public mailing 
lists. 

9.1 FIAT 

The Finite element Automatic Tabulator FIAT [83] was first introduced in [82] and imple- 
ments the ideas discussed above in Section [H for the automatic tabulation of finite element 
basis functions based on a linear algebraic representation of function spaces and constraints. 

FIAT provides functionality for defining finite element function spaces as constrained 
subsets of polynomials on the simplices in one, two and three space dimensions, as well 
as a library of predefined finite elements, including arbitrary degree Lagrange [271 120] , 
Hermite [271 HD], Raviart-Thomas [m], Brezzi-Douglas-Marini [22] and Nedelec [T06] 
elements, as well as the (first degree) Crouzeix-Raviart element [31]. Furthermore, the 
plan is to support Brezzi-Douglas-Fortin-Marini [23] and Arnold- Winther [1] elements in 
future versions. 

In addition to tabulating finite element nodal basis functions (as linear combinations of 
a reference basis), FIAT generates quadrature points of any given order on the reference 
simplex and provides functionality for efficient tabulation of the basis functions and their 
derivatives at any given set of points. In Figure [S] and Figure [21 we present some examples 
of basis functions generated by FIAT. 

Although FIAT is implemented in Python, the interpretive overhead of Python com- 
pared to compiled languages is small, since the operations involved may be phrased in terms 
of standard linear algebra operations, such as the solution of linear systems and singular 
value decomposition, see [SI] . FIAT may thus make use of optimized Python linear algebra 
libraries such Python Numeric |107j . Recently, a C++ version of FIAT called FIAT++ has 
also been developed with run-time bindings for Sundance [1031 llOll I102j . 

9.2 FFC 

The FEniCS Form Compiler FFC [98], first introduced in [87|, automates the evaluation of 
multilinear forms as outlined in Section [5] by automatically generating code for the efficient 



^FIAT is licensed under the Lesser General Public License [54) . 
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Figure 9. A basis function associated with an interior point for a fifth-degree 
Lagrange finite element on a triangle. (Courtesy of Robert C. Kirby.) 
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computation of the element tensor corresponding to a given multilinear form. FFC thus 
functions as domain-specific compiler for multilinear forms, taking as input a set of discrete 
function spaces together with a multilinear form defined on these function spaces, and 
produces as output optimized low-level code, as illustrated in Figure [lOl In its simplest 
form, FFC generates code in the form of a single C-|— |- header file that can be included in 
a C-|— |- program, but FFC can also be used as a just-in-time compiler within a scripting 
environment like Python, for seamless definition and evaluation of multilinear forms. 



a{v,U) 



Vv ■ VU dx 




A 



Poisson.h 



Figure 10. The form compiler FFC takes as input a multilinear form together with 
a set of function spaces and generates optimized low-level (C-|— 1-) code 
for the evaluation of the associated element tensor. 



9.2.1 Form language 

The FFC form language is generated from a small set of basic data types and operators that 
allow a user to define a wide range of multilinear forms, in accordance with the discussion 
of Section 15.4.31 As an illustration, we include below the complete definition in the FFC 
form language of the bilinear forms for the test cases considered above in Tables [T0lll3[ 
We refer to the FFC user manual [99] for a detailed discussion of the form language, but 
note here that in addition to a set of standard operators, including the inner product dot, 
the partial derivative D, the gradient grad, the divergence div and the rotation rot, FFC 
supports Einstein tensor-notation (Table [M]) and user-defined operators (operator epsilon 
in Table [25]). 

element = FiniteElement ("Lagrange" , "tetrahedron", 1) 

V = BasisFunction(element) 
U = BasisFunction(elenient) 



a = v*U*dx 



Table 22. The complete definition of the bilinear form a{v, U) = J^vU dx in the 
FFC form language (test case 1). 



9.2.2 Implementation 

The FFC form language is implemented in Python as a collection of Python classes (in- 
cluding BasisFunction, Function, FiniteElement etc.) and operators on theses classes. 
Although FFC is implemented in Python, the interpretive overhead of Python has been 
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element = FiniteElement ("Lagrange" , "tetrahedron", 1) 

V = BasisFunction(element) 
U = BasisFunction(element) 

a = dot(grad(v), grad(U))*dx 



Table 23. The complete definition of the bihnear form a{v, U) = Vv ■ V(7 da; 
in the FFC form language (test case 2). 



element = FiniteElement ("Vector Lagrange", "tetrahedron", 1) 

V = BasisFunction(element) 
U = BasisFunction(element) 
w = Function(element) 

a = v[i]*w[j]*D(U[i] , j)*dx 



Table 24. The complete definition of the bilinear form a{v, U) = JqI^- {w-\/)U dx 
in the FFC form language (test case 3). 



element = FiniteElement ("Vector Lagrange", "tetrahedron", 1) 

V = BasisFunction(element) 
U = BasisFunction(element) 

def epsilon(v) : 

return 0.5*(grad(v) + transp(grad(v) ) ) 

a = dot (epsilon(v) , epsilon(U) ) *dx 



Table 25. The complete definition of the bilinear form a{v, U) — Jj^ e{v) : e(U) dx 
in the FFC form language (test case 4). 
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minimized by judicious use of optimized numerical libraries such as Python Numeric |107j . 
The computationally most expensive part of the compilation of a multilinear form is the 
precomputation of the reference tensor. As demonstrated in [SS], by suitably pretabulating 
basis functions and their derivatives at a set of quadrature points (using FIAT), the ref- 
erence tensor can be computed by assembling a set of outer products, which may each be 
efficiently computed by a call to Python Numeric. 

Currently, the only optimization FFC makes is to avoid multiplications with any zeros of 
the reference tensor when generating code for the tensor contraction = A^ : Gk- As 
part of the FEniCS project, an optimizing backend, FErari (Finite Element Re-arrangement 
Algorithm to Reduce Instructions), is currently being developed. Ultimately, FFC will 
call FErari at compile-time to find an optimized computation of the tensor contraction, 
according to the discussion in Section [71 

9.2.3 Benchmark results 

As a demonstration of the efficiency of the code generated by FFC, we include in Table [26] 
a comparison taken from [S7j between a standard implementation, based on computing the 
element tensor A^ on each cell K hy a. loop over quadrature points, with the code automat- 
ically generated by FFC, based on precomputing the reference tensor A^ and computing 
the element tensor A^ by the tensor contraction A^ = AP : Gk on each cell. 
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Table 26. Speedups for test cases 1-4 (Tables HMTSl and Tables [221125]) in two 

and three space dimensions. 



As seen in Table [26l the speedup ranges between one and three orders of magnitude, 
with larger speedups for higher degree elements. In Figure [TT] and Figure [T2l we also plot 
the dependence of the speedup on the polynomial degree for test cases 1 and 2 respectively. 

It should be noted that the total work in a simulation also includes the assembly of the 
local element tensors {A^}KeT into the global tensor A, solving the linear system, iterating 
on the nonlinear problem etc. Therefore, the overall speedup may be significantly less than 
the speedups reported in Table [26] We note that if the computation of the local element 
tensors normally accounts for a fraction 9 S (0, 1) of the total run-time, then the overall 
speedup gained by a speedup of size s > 1 for the computation of the element tensors will 
be 

which is significant only if 9 is significant. As noted in [85], 9 may be significant in many 
cases, in particular for nonlinear problems where a nonlinear system (or the action of a 
linear operator) needs to be repeatedly reassembled as part of an iterative method. 
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Figure 11. Benchmark results for test case 1, the mass matrix, specified in FFC 
by a = v*U*dx. 




Figure 12. Benchmark results for test case 2, Poisson's equation, specified in FFC 
by a = dot(grad(v), grad(U))*dx. 
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9.2.4 User interfaces 

FFC can be used either as a stand-alone compiler on the command-line, or as a Python 
module from within a Python script. In the first case, a multilinear form (or a pair of 
bilinear and linear forms) is entered in a text file with suffix .form and then compiled by 
calling the command ff c with the form file on the command- line. 

By default, FFC generates C++ code for inclusion in a DOLFIN C++ program (see 
Section 19.31 below) but FFC can also compile code for other backends (by an appropriate 
compiler flag), including the ASE (ANL SIDL Environment) format [HI], XML format, and 
KTgX format (for inclusion of the tensor representation in reports and presentations). The 
format of the generated code is separated from the parsing of forms and the generation of 
the tensor contraction, and new formats for alternative backends may be added with little 
effort, see Figure [T3l 



Format 



N 

poisson .form F 



Parser 




Compiler 














... J...T " 


- 1 





N 

Poisson. h 



FIAT 



FErari 



Figure 13. Component diagram for FFC. 



Alternatively, FFC can be used directly from within Python as a Python module, al- 
lowing definition and compilation of multilinear forms from within a Python script. If used 
together with the recently developed Python interface of DOLFIN (PyDOLFIN), FFC 
functions as a just-in-time compiler for multilinear forms, allowing forms to be defined and 
evaluated from within Python. 

9.3 DOLFIN 

DOLFIN [621 1681 [63] , Dynamic Object-oriented Library for FINite element computation, 
functions as a general programming interface to DOLFIN and provides a problem-solving 
environment (PSE) for differential equations in the form of a C++/Python class library. 

Initially, DOLFIN was developed as a self-contained (but modularized) C++ code for 
finite element simulation, providing basic functionality for the definition and automatic 
evaluation of multilinear forms, assembly, linear algebra, mesh data structures and adaptive 
mesh refinement, but as a consequence of the development of focused components for each 
of these tasks as part of the FEniCS project, a large part (but not all) of the functionality 
of DOLFIN has been delegated to these other components while maintaining a consistent 
programming interface. Thus, DOLFIN relies on FIAT for the automatic tabulation of finite 
element basis functions and on FFC for the automatic evaluation of multilinear forms. We 
discuss below some of the key aspects of DOLFIN and its role as a component of the FEniCS 
project. 
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9.3.1 Automatic assembly of the discrete system 

DOLFIN implements the automatic assembly of the discrete system associated with a given 
variational problem as outlined in Section O DOLFIN iterates over the cells {K}k(^j- of a 
given mesh T and calls the code generated by FFC on each cell K to evaluate the element 
tensor . FFC also generates the code for the local-to-global mapping which DOLFIN 
calls to obtain a rule for the addition of each element tensor to the global tensor A. 

Since FFC generates the code for both the evaluation of the element tensor and for 
the local-to-global mapping, DOLFIN needs to know very little about the finite element 
method. It only follows the instructions generated by FFC and operates abstractly on the 
level of Algorithm [21 

9.3.2 Meshes 

DOLFIN provides basic data structures and algorithms for simplicial meshes in two and 
three space dimensions (triangular and tetrahedral meshes) in the form of a class Mesh, 
including adaptive mesh refinement. As part of PETSc [9l O [10] and the FEniCS project, 
the new component Sieve |8H [80] is currently being developed. Sieve generalizes the mesh 
concept and provides powerful abstractions for dimension-independent operations on mesh 
entities and will function as a backend for the mesh data structures in DOLFIN. 

9.3.3 Linear algebra 

Previously, DOLFIN provided a stand-alone basic linear algebra library in the form of a 
class Matrix, a class Vector and a collection of iterative and direct solvers. This imple- 
mentation has recently been replaced by a set of simple wrappers for the sparse linear 
algebra library provided by PETSc [9l[8l[T0]. As a consequence, DOLFIN is able to provide 
sophisticated high-performance parallel linear algebra with an easy-to-use object-oriented 
interface suitable for finite element computation. 

9.3.4 ODE solvers 

DOLFIN also provides a set of general order mono-adaptive and multi-adaptive |94[ [95] [50] 
Wl\ IIUO] ODE-solvers, automating the solution of ordinary differential equations. Although 
the ODE-solvers may be used in connection with the automated assembly of discrete sys- 
tems, DOLFIN does currently not provide any level of automation for the discretization of 
time-dependent PDEs. Future versions of DOLFIN (and FFC) will allow time-dependent 
PDEs to be defined directly in the FFC form language with automatic discretization and 
adaptive time-integration. 

9.3.5 PDF solvers 

In addition to providing a class library of basic tools that automate the implementation 
of adaptive finite element methods, DOLFIN provides a collection of ready-made solvers 
for a number of standard equations. The current version of DOLFIN provides solvers for 
Poisson's equation, the heat equation, the convection-diffusion equation, linear elasticity, 
updated large-deformation elasticity, the Stokes equations and the incompressible Navier- 
Stokes equations. 

9.3.6 Pre- and post-processing 

DOLFIN relies on interaction with external tools for pre-processing (mesh generation) and 
post-processing (visualization). A number of output formats are provided for visualization, 
including DOLFIN XML [63], VTK [90] (for use in ParaView HH] or MayaVi plO] ). 
Octave [37], MATLAB [Il8], OpenDX [I], GiD [28] and Tecplot [2]. DOLFIN may also be 
easily extended with new output formats. 
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9.3.7 User interfaces 

DOLFIN can be accessed either as a C++ class library or as a Python module, with the 
Python interface generated semi-automatically from the C++ class library using SWIG \15\ 
I14| . In both cases, the user is presented with a simple and consistent but powerful pro- 
gramming interface. 

As discussed in Section [2331 DOLFIN provides a set of ready-made solvers for standard 
differential equations. In the simplest case, a user thus only needs to supply a mesh, a set 
of boundary conditions and any parameters and variable coefficients to solve a differential 
equation, by calling one of the existing solvers. For other differential equations, a solver 
may be implemented with minimal effort using the set of tools provided by the DOLFIN 
class library, including variational problems, meshes and linear algebra as discussed above. 

9.4 Related Components 

We also mention two other projects developed as part of FEniCS. One of these is Puffin [701 
[69], a light-weight educational implementation of the basic functionality of FEniCS for 
Octave/MATLAB, including automatic assembly of the linear system from a given vari- 
ational problem. Puffin has been used with great success in introductory undergraduate 
mathematics courses and is accompanied by a set of exercises [79] developed as part of the 
Body and Soul reform project [IH] [H] [JJl [JO] for applied mathematics education. 

The other project is the Ko mechanical simulator [76] . Ko uses DOLFIN as the com- 
putational backend and provides a specialized interface to the simulation of mechanical 
systems, including large-deformation elasticity and collision detection. Ko provides two 
different modes of simulation: either a simple mass-spring model solved as a system of 
ODEs, or a large-deformation updated elasticity model [78] solved as a system of time- 
dependent PDEs. As a consequence of the efficient assembly provided by DOLFIN, based 
on efficient code being generated by FFC, the overhead of the more complex PDF model 
compared to the simple ODE model is relatively small. 

10 EXAMPLES 

In this section, we present a number of examples chosen to illustrate various aspects of 
the implementation of finite element methods for a number of standard partial differential 
equations with the FEniCS framework. We already saw in Section[2]the specification of the 
variational problem for Poisson's equation in the FFC form language. The examples below 
include static linear elasticity, two different formulations for the Stokes equations and the 
time-dependent convection-diffusion equations with the velocity field given by the solution 
of the Stokes equations. For simplicity, we consider only linear problems but note that the 
framework allows for implementation of methods for general nonlinear problems. See in 
particular [78] and [651IMI66]. 

10.1 Static Linear Elasticity 

As a first example, consider the equation of static linear elasticity [20] for the displacement 
u = u{x) of an elastic shape Q. € M*^, 

-V-a{u) = f inn, 

u = uq on Fq C dQ, (117) 
a{u)h = on dVl \ Fq, 

where n denotes a unit vector normal to the boundary dO.. The stress tensor a is given by 



a{v) = 2fie{v) + Atrace(e(?;))/, 



(118) 
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where / is the d x d identity matrix and where the strain tensor e is given by 

e{v) = ^(Vv+{Vvy), (119) 
that is, eij{v) = \i§^ + for i,j = l,...,d. The Lame constants fi and A are given by 

with E the Young's modulus of elasticity and the Poisson ratio, see [121) . In the example 
below, we take £" = 10 and = 0.3. 

To obtain the discrete variational problem corresponding to (|117p . we multiply with a 
test function v in a suitable discrete test space and integrate by parts to obtain 



[ Vv:a{U)dx= [ v ■ f dx ^veVh- (121) 
Jn Jn 



The corresponding formulation in the FFC form language is shown in Table [27] for an 
approximation with linear Lagrange elements on tetrahedra. Note that by defining the 
operators a and e, it is possible to obtain a very compact notation that corresponds well 
with the mathematical notation of (|12ip . 



element = FiniteElement ( "Vector Lagrange", "tetrahedron", 1) 

V = BasisFunction (element) 
U = BasisFunction(element) 
f = Function(element) 

E =10.0 
nu = 0.3 

mu = E / (2*(1 + nu)) 

Imbda = E*nu / ((1 + nu)*(l - 2*nu)) 

def epsilon(v) : 

return 0.5*(grad(v) + transp(grad(v) ) ) 

def sigma(v) : 

return 2*mu*epsilon(v) + lmbda*mult (trace (epsilon(v) ) , Identity (len(v) ) ) 

a = dot(grad(v), sigma(U))*dx 
L = dot(v, f)*dx 



Table 27. The complete specification of the variational problem (|121|l for static 
linear elasticity in the FFC form language. 



Computing the solution of the variational problem for a domain given by a gear, 
we obtain the solution in Figure [TH The gear is clamped at two of its ends and twisted 
30 degrees, as specified by a suitable choice of Dirichlet boundary conditions on Tq. 
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Figure 14. The original domain fl of the gear (above) and the twisted gear (below), 
obtained by displacing SI at each point x £ Qhy the value of the solution 
u of (|117[l at the point x. 
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10.2 The Stokes Equations 

Next, we consider tlie Stokes equations, 

-An + Vp = f in 0, 

V -u = inn, (122) 
u = uq on dQ, 

for the velocity field u = u{x) and the pressure p = p{x) in a highly viscous medium. By 
multiplying the two equations with a pair of test functions (v, q) chosen from a suitable 
discrete test space Vh = Vj^ x V^, we obtain the discrete variational problem 

[ Vv.VU -{V ■v)P + qV ■Udx= [ V fdx y{v,q)€Vh. (123) 
Jn Jn 

for the discrete approximate solution (U, P) £ Vh = Vf^ x V^. To guarantee the existence of 

a unique solution of the discrete variational problem p23|) . the discrete function spaces Vh 
and Vh must be chosen appropriately. The Babuska-Brezzi [SJ [21] inf-sup condition gives 
a precise condition for the selection of the approximating spaces. 

10.2.1 Tayloi — Hood elements 

One way to fulfill the Babuska-Brezzi condition is to use different order approximations for 
the velocity and the pressure, such as degree q polynomials for the velocity and degree q — 1 
for the pressure, commonly referred to as Taylor-Hood elements, see |191I20| . The resulting 
mixed formulation may be specified in the FFC form language by defining a Taylor-Hood 
element as the direct sum of a degree q vector- valued Lagrange element and a degree q — 1 
scalar Lagrange element, as shown in Table [28l Figure [16] shows the velocity field for the 
fiow around a two-dimensional dolphin computed with a P2-P1 Taylor-Hood approximation. 



P2 = FiniteElement ("Vector Lagrange", "triangle", 2) 
PI = FiniteElement ("Lagrange" , "triangle", 1) 
TH = P2 + PI 

(v, q) = BasisFunctions (TH) 
(U, P) = BasisFunctions (TH) 

f = Function (P2) 

a = (dot(grad(v) , grad(U)) - div(v)*P + q*div(U))*dx 
L = dot(v, f)*dx 



Table 28. The complete specification of the variational problem (|123p for the 
Stokes equations with P2-P1 Taylor-Hood elements. 

10.2.2 A stabilized equal- order formulation 

Alternatively, the Babuska-Brezzi condition may be circumvented by an appropriate modi- 
fication (stabilization) of the variational problem (jl23p . In general, an appropriate modifica- 
tion may be obtained by a Galer kin/least-squares (GLS) stabilization, that is, by modifying 
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the test function w = {v,q) according to w w + 6 Aw, where A is the operator of the 
differential equation and 6 = 6{x) is suitable stabilization parameter. Here, we a choose 
simple pressure-stabilization obtained by modifying the test function w = {v, q) according 
to 

{v,q) ^{v,q) + {6Vq,0). (124) 

The stabilization ()124p is sometimes referred to as a pressure-stabilizing/Petrov-Galerkin 
(PSPG) method, see \72\ 156] , Note that the stabilization (jl24p may also be viewed as a 
reduced GLS stabilization. 

We thus obtain the following modified variational problem: Find {U, P) (zVh such that 

/ Vv:WU-{V-v)P + qV-U + 6Vq-VPdx= [ {v + 6Vq)-fdx y{v,q)eVh. (125) 
Jn Jn 

Table [29] shows the stabilized equal-order method in the FFC form language, with the 
stabilization parameter given by 

6 = /3/l^ (126) 
where f3 = 0.2 and h = h{x) is the local mesh size (cell diameter). 



vector = FiniteElement ( "Vector Lagrange", "triangle", 1) 
scalar = FiniteElement ( "Lagrange" , "triangle", 1) 
system = vector + scalar 

(v, q) = BasisFunctions (system) 
(U, P) = BasisFunctions (system) 

f = Function(vector) 
h = Function(scalar) 

d = 0.2*h*h 

a = (dot(grad(v) , grad(U)) - div(v)*P + q*div(U) + d*dot (grad(q) , grad(P)))*dx 
L = dot(v + mult(d, grad(q)), f)*dx 



Table 29. The complete specification of the variational problem (|125|) for the 
Stokes equations with an equal-order Pi-Pi stabilized method. 

In Figure [TSl we illustrate the importance of stabilizing the equal-order method by 
plotting the solution for the pressure with and without stabilization. Without stabilization, 
the solution oscillates heavily. Note that the scaling is chosen differently in the two images, 
with the oscillations scaled down by a factor two in the unstabilized solution. The situation 
without stabilization is thus even worse than what the figure indicates. 

10.3 Convection Diffusion 

As a final example, we compute the temperature u = u{x, t) around the dolphin (Figure [T7|) 
from the previous example by solving the time-dependent convection-diffusion equations, 

ii + b-Wu-V ■ (cVu) = f inOx(0,r], 

u = UQ ondnx{0,T], (127) 
It = no at X {0}, 




Figure 15. The pressure for the flow around a two-dimensional dolphin, obtained 
by solving the Stokes equations (|122[) by an unstabilized Pi -Pi approx- 
imation (above) and a stabilized Pi -Pi approximation (below). 
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Figure 16. The velocity field for the fiow around a two-dimensional dolphin, ob- 
tained by solving the Stokes equations (|122p by a P2-P1 Taylor-Hood 
approximation. 




Figure 17. The temperature around a hot dolphin in surrounding cold water with 
a hot infiow, obtained by solving the convection-diffusion equation with 
the velocity field obtained fi-om a solution of the Stokes equations with 
a P2-P1 Taylor-Hood approximation. 
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with velocity field b = h{x) obtained by solving the Stokes equations. 

We discretize (jl27p with the cG(l)cG(l) method, that is, with continuous piecewise 
linear functions in space and time (omitting stabilization for simplicity). The interval [0, T] 
is partitioned into a set of time intervals = to < < ■ ■ ■ < in-i < tn < ■ ■ ■ < ihi = T and 
on each time interval {tn-i,tr^, we pose the variational problem 

rtn r rtn r 

/ {v,U) + vb-VU + cVvVUdxdt= vfdxdt VveVh, (128) 

with Vh the space of all continuous piecewise linear functions in space. Note that the 
cG(l) method in time uses piecewise constant test functions, see [TH [73l [511 [39l IM] • As a 
consequence, we obtain the following variational problem for [/" Vh = Vh, the piecewise 
linear in space solution at time t = tn, 

/ V + vb- V(C/" + U"-^)/2 + cVv V([/" + C/"-i)/2dx 

V f dxdt \/v ^ Vh, 



tn-l ''n 



where = — tn-i is the size of the time step. We thus obtain a variational problem of 
the form 

a{v,U'') = L{v) yv£Vh, (130) 

where 

a(w,C/")= / vU''dx + — {vb-VU'' + cVv-VU'') dx, 
Jn 2 

L{v)= I vU''-^dx- — {vb-VU''-^+cVvVU'''^)dx+ [" [ vfdxdt. 
Jn 2 Jtn-i Jn 

(131) 

The corresponding specification in the FFC form language is presented in Table \3U[ 
where for simplicity we approximate the right-hand side with its value at the right end- 
point. 

11 OUTLOOK: THE AUTOMATION OF CMM 

The automation of the finite element method, as described above, constitutes an impor- 
tant step towards the Automation of Computational Mathematical Modeling (ACMM), as 
outlined in [96]. In this context, the automation of the finite element method amounts to 
the automation of discretization, that is, the automatic translation of a given continuous 
model to a system of discrete equations. Other key steps include the automation of discrete 
solution, the automation of error control, the automation of modeling and the automation 
of optimization. We discuss these steps below and also make some comments concerning 
automation in general. 

11.1 The Principles of Automation 

An automatic system carries out a well-defined task without intervention from the person or 
system actuating the automatic process. The task of the automating system may be formu- 
lated as follows: For given input satisfying a fixed set of conditions (the input conditions), 
produce output satisfying a given set of conditions (the output conditions). 
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scalar = FiniteElement ("Lagrange" , "triangle", 1) 
vector = FiniteElement ("Vector Lagrange", "triangle", 2) 

V = BasisFunction(scalar) 
Ul = BasisFunction(scalar) 
UO = Function(scalar) 
b = Function(vector) 
f = Function(scalar) 

c = 0.005 
k = 0.05 

a = v*Ul*dx + 0.5*k*(v*dot(b, grad(Ul)) + c*dot (grad(v) , grad(Ul) ) ) *dx 

L = v*UO*dx - 0.5*k*(v*dot(b, grad(UO)) + c*dot (grad(v) , grad(UO) ) ) *dx + k*v*f*dx 



Table 30. The complete specification of tlie variational problem (|130p for cG(l) 
time-stepping of the convection-diffusion equation. 



An automatic process is defined by an algorithm, consisting of a sequential list of instruc- 
tions (like a computer program). In automated manufacturing, each step of the algorithm 
operates on and transforms physical material. Correspondingly, an algorithm for the Au- 
tomation of CMM operates on digits and consists of the automated transformation of digital 
information. 

A key problem of automation is the design of a feed-back control, allowing the given 
output conditions to be satisfied under variable input and external conditions, ideally at a 
minimal cost. Feed-back control is realized through measurement, evaluation and action; 
a quantity relating to the given set of conditions to be satisfied by the output is measured, 
the measured quantity is evaluated to determine if the output conditions are satisfied or 
if an adjustment is necessary, in which case some action is taken to make the necessary 
adjustments. In the context of an algorithm for feed-back control, we refer to the evaluation 
of the set of output conditions as the stopping criterion, and to the action as the modification 
strategy. 

A key step in the automation of a complex process is modularization, that is, the hier- 
archical organization of the complex process into components or sub processes. Each sub 
process may then itself be automated, including feed-back control. We may also express this 
as abstraction, that is, the distinction between the properties of a component (its purpose) 
and the internal workings of the component (its realization). 

Modularization (or abstraction) is central in all engineering and makes it possible to 
build complex systems by connecting together components or subsystems without concern 
for the internal workings of each subsystem. The exact partition of a system into compo- 
nents is not unique. Thus, there are many ways to partition a system into components. In 
particular, there are many ways to design a system for the Automation of Computational 
Mathematical Modeling. 

We thus identify the following basic principles of automation: algorithms, feed-back 
control, and modularization. 
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11.2 Computational Mathematical Modeling 

In automated manufacturing, the task of the automating system is to produce a certain 
product (the output) from a given piece of material (the input), with the product satisfying 
some measure of quality (the output conditions). 



A{u) = f 



TOL > 




U 



Figure 18. The Automation of Computational Mathematical Modeling. 



For the Automation of CMM, the input is a given model of the form 

Aiu) = /, (132) 

for the solution m on a given domain il. x (0, T] in space-time, where ^ is a given differential 
operator and where / is a given source term. The output is a discrete solution U ^ u 
satisfying some measure of quality. Typically, the measure of quality is given in the form 
of a tolerance TOL > for the size of the error e = U — u in a suitable norm, ||e|| < TOL, 
or alternatively, the error in some given functional M, 

|M(?7) -M(u)| < TOL. (133) 

In addition to controlling the quality of the computed solution, one may also want to de- 
termine a parameter that optimizes some given cost functional depending on the computed 
solution (optimization). We refer to the overall process, including optimization, as the 
Automation of CMM. 

The key problem for the Automation of CMM is thus the design of a feed-back control 
for the automatic construction of a discrete solution U, satisfying the output condition 
(|133p at minimal cost. The design of this feed-back control is based on the solution of 
an associated dual problem, connecting the size of the residual R{U) = A{U) — f of the 
computed discrete solution to the size of the error e, and thus to the output condition ()133p . 

11.3 An Agenda for the Automation of CMM 

Following our previous discussion on modularization as a basic principle of automation, we 
identify the following key steps in the Automation of CMM: 

(i) the automation of discretization, that is, the automatic translation of a continuous 
model of the form (|13'2p to a system of discrete equations; 

(ii) the automation of discrete solution, that is, the automatic solution of the system of 
discrete equations obtained from the automatic discretization of ()132p . 

(iii) the automation of error control, that is, the automatic selection of an appropriate 
resolution of the discrete model to produce a discrete solution satisfying the given 
accuracy requirement with minimal work; 
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(iv) the automation of modeling, that is, the automatic selection of the model (jl32p . either 
by constructing a model from a given set of data, or by constructing from a given 
model a reduced model for the variation of the solution on resolvable scales; 



(v) the automation of optimization, that is, the automatic selection of a parameter in the 
model (I132p to optimize a given goal functional. 

In Figure UM we demonstrate how (i)-(iv) connect to solve the overall task of the 
Automation of CMM (excluding optimization) in accordance with Figure [T8l We discuss 
(i)-(v) in some detail below. In all cases, feed-back control, or adaptivity, plays a key role. 



Aiu) = f 



TOL 



U 



Aiu) = f + g{u) 



F{x) = 




{Vh,Vh) tol>0 



U 



X K,x 



U 



Figure 19. A modularized view of the Automation of Computational Mathematical 
Modeling. 



11.4 The Automation of Discretization 

The automation of discretization amounts to automatically generating a system of discrete 
equations for the degrees of freedom of a discrete solution U approximating the solution u 
of the given model (|132|) , or alternatively, the solution u £ V oi a corresponding variational 
problem 

a{u;v)=L{v) WveV, (134) 

where as before a : V x V is a semilinear form which is linear in its second argument 
and L : y — )> M is a linear form. As we saw in Section [3l this process may be automated 
by the finite element method, by replacing the function spaces {V, V) with a suitable pair 
(^h)V/i) of discrete function spaces, and an approach to its automation was discussed in 
Sections HHSl As we shall discuss further below, the pair of discrete function spaces may be 
automatically chosen by feed-back control to compute the discrete solution U both reliably 
and efficiently. 
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11.5 The Automation of Discrete Solution 

Depending on the model (jl32p and the method used to automatically discretize the model, 
the resulting system of discrete equations may require more or less work to solve. Typically, 
the discrete system is solved by some iterative method such as the conjugate gradient 
method (CG) or GMRES, in combination with an appropriate choice of preconditioner, see 
for example [TT41I32]. 

The resolution of the discretization of (jl32p may be chosen automatically by feed-back 
control from the computed solution, with the target of minimizing the computational work 
while satisfying a given accuracy requirement. As a consequence, see for example [M], one 
obtains an accuracy requirement on the solution of the system of discrete equations. Thus, 
the system of discrete equations does not need to be solved to within machine precision, but 
only to within some discrete tolerance tol > for some error in a functional of the solution 
of the discrete system. We shall not pursue this question further here, but remark that the 
feed-back control from the computed solution to the iterative algorithm for the solution 
of the system of discrete equations is often weak, and the problem of designing efficient 
adaptive iterative algorithms for the system of discrete equations remains open. 

11.6 The Automation of Error Control 

As stated above, the overall task is to produce a solution of (|132p that satisfies a given 
accuracy requirement with minimal work. This includes an aspect of reliability, that is, the 
error in an output quantity of interest depending on the computed solution should be less 
than a given tolerance, and an aspect of efficiency, that is, the solution should be computed 
with minimal work. Ideally, an algorithm for the solution of (|132p should thus have the 
following properties: Given a tolerance TOL > and a functional M, the algorithm shall 
produce a discrete solution U approximating the exact solution u of (|132p . such that 

(A) \M{U)-M{u)\ < TOL; 

(B) the computational cost of obtaining the approximation U is minimal. 

Conditions (A) and (B) can be satisfied by an adaptive algorithm, with the construction of 
the discrete representation (V/^, V/i) based on feed-back from the computed solution. 

An adaptive algorithm typically involves a stopping criterion, indicating that the size 
of the error is less than the given tolerance, and a modification strategy to be applied if 
the stopping criterion is not satisfied. Often, the stopping criterion and the modification 
strategy are based on an a posteriori error estimate E > \M(U) — M{u)\, estimating the 
error in terms of the residual R{U) = A{U) — f and the solution 99 of a dual problem 
connecting to the stability of (jl32p . 

11.6.1 The dual problem 

The dual problem of (jl32p for the given output functional M is given by 

I'V = V', (135) 

on X [0,T), where A'* denotes the adjoiniF^ of the Prechet derivative A' of A evaluated 
at a suitable mean value of the exact solution u and the computed solution U , 

A'=[ A' {sU + {I - s)u) ds, (136) 
Jo 



The adjoint is defined by {Av, w) — {v, A'w) for all v,w G V such that v — w — Oatt — and t = T. 
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and where ^/J is the Riesz representer of a similar mean value of the Frechet derivative M' 

of M, 

{v,^)=M'v yv£V. (137) 

By the dual problem ()135p . we directly obtain the error representation 

M{U) - M{u) = W{U -u) = (U -u,^) = {U - u, W*ip) = (A^{U -u),ip) 
= {A{U) - A{u), ^) = {A{U) -f,ip) = {R{U),ip). 

Noting now that if the solution U is computed by a Galerkin method and thus {R{U),v) = 
for any v & Vh, we obtain 

M{U)-M{u) = {R{U),ip-7rh^), (139) 

where tt/j^? is a suitable approximation of in V/^. One may now proceed to estimate the 
error M(U) — M{u) in various ways, either by estimating the interpolation error Hh'^ — (p 
or by directly evaluating the quantity {R{U),ip — T^hV)- The residual R{U) and the dual 
solution (p give precise information about the influence of the discrete representation {Vh, Vh) 
on the size of the error, which can be used in an adaptive feed-back control to choose a 
suitable discrete representation for the given output quantity M of interest and the given 
tolerance TOL for the error, see [381 [T71I521IM] . 

11.6.2 The weak dual problem 

We may estimate the error similarly for the variational problem (jl34p by considering the 
following weak (variational) dual problem: Find ip (zV such that 

a'*{U,u; v,ip) =W{U,u;v) VvGY, (140) 

where a'* denotes the adjoint of the bilinear form a', given as above by an appropriate 
mean value of the Frechet derivative of the semilinear form a. We now obtain the error 
representation 

M{U) - M{u) = W{U, u-U -u) = a!*{U, u;U -u,p) = a'{U, u; - u) ^^^^^ 
= a{U ; ip) — a{u; p) = a{U ; p) — L{ip). 

As before, we use the Galerkin orthogonality to subtract a{U ; irp) — L{Trp) = for some 
T^hf ^ Vh y and obtain 

M{U) - M{u) = a{U; p - irp) - L{p - irp). (142) 

To automate the process of error control, we thus need to automatically generate and 
solve the dual problem ()135p or (jl40p from a given primal problem (I132p or (|134p . We 
investigate this question further in |61j . 

11.7 The Automation of Modeling 

The automation of modeling concerns both the problem of finding the parameters describing 
the model p32p from a given set of data (inverse modeling) , and the automatic construction 
of a reduced model for the variation of the solution on resolvable scales (model reduction). 
We here discuss briefly the automation of model reduction. 

In situations where the solution u of (|132p varies on scales of different magnitudes, 
and these scales are not localized in space and time, computation of the solution may be 
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very expensive, even with an adaptive method. To make computation feasible, one may 
instead seek to compute an average u of the solution u of (|132p on resolvable scales. Typical 
examples include meteorological models for weather prediction, with fast time scales on the 
range of seconds and slow time scales on the range of years, or protein folding represented 
by a molecular dynamics model, with fast time scales on the range of femtoseconds and 
slow time scales on the range of microseconds. 

Model reduction typically involves extrapolation from resolvable scales, or the construc- 
tion of a large-scale model from local resolution of fine scales in time and space. In both 
cases, a large-scale model 

A{u) = f + g{u), (143) 

for the average u is constructed from the given model ()132p with a suitable modeling term 
g{u) ^A{u)-A{u). 

Replacing a given model with a computable reduced model by taking averages in space 
and time is sometimes referred to as suhgrid modeling. Subgrid modeling has received much 
attention in recent years, in particular for the incompressible Navier-Stokes equations, 
where the subgrid modeling problem takes the form of determining the Reynolds stresses 
corresponding to g. Many subgrid models have been proposed for the averaged Navier- 
Stokes equations, but no clear answer has been given. Alternatively, the subgrid model 
may take the form of a least-squares stabilization, as suggested in [BSJ EH ES]- In either 
case, the validity of a proposed subgrid model may be verified computationally by solving an 
appropriate dual problem and computing the relevant residuals to obtain an error estimate 
for the modeling error, see j77j . 

11.8 The Automation of Optimization 

The automation of optimization relies on the automation of (i)-(iv), with the solution 
of the primal problem (|132|) and an associated dual problem being the key steps in the 
minimization of a given cost functional. In particular, the automation of optimization 
relies on the automatic generation of the dual problem. 

The optimization of a given cost functional J = J{u,p), subject to the constraint (|132p . 
with p a function (the control variables) to be determined, can be formulated as the problem 
of finding a stationary point of the associated Lagrangian, 

L{u,p,^) = J(u,p) + {A(u,p) - f{p),v), (144) 

which takes the form of a system of differential equations, involving the primal and dual 
problems, as well as an equation expressing stationarity with respect to the control vari- 
ables p, 

A{u,p) = f{p), 

{A'y{u,p)^ = -dJ/du, (145) 
dj/dp = {df/dp)*ip - (dA/dpYip. 

It follows that the optimization problem may be solved by the solution of a system of 
differential equations. Note that the first equation is the given model (|132p . the second 
equation is the dual problem and the third equation gives a direction for the update of the 
control variables. The automation of optimization thus relies on the automated solution 
of both the primal problem ()132p and the dual problem ()135p . including the automatic 
generation of the dual problem. 
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12 CONCLUDING REMARKS 

With the FEniCS project [60], we have the beginnings of a working system automating 
(in part) the finite element method, which is the first step towards the Automation of 
Computational Mathematical Modeling, as outlined in [SS]- As part of this work, a number 
of key components, FIAT, FFC and DOLFIN, have been developed. These components 
provide reference implementations of the algorithms discussed in Sections [SHSl 

As the current toolset, focused mainly on an automation of the finite element method 
(the automation of discretization), is becoming more mature, important new areas of re- 
search and development emerge, including the remaining key steps towards the Automation 
of CMM. In particular, we plan to explore the possibility of automatically generating dual 
problems and error estimates in an effort to automate error control. 
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NOTATION 



A - the differential operator of the model A{u) = f 

A - the global tensor with entries {Ajjjgj 

A'^ ~ the reference tensor with entries {A^^}i^xK.aeA 

A^ - the matrix representation of the (flattened) reference tensor ^4" 

A^ ~ the element tensor with entries {Af}i^xK 

a - the semilinear, multilinear or bilinear form 

ax - the local contribution to a multilinear form a from K 

- the vector representation of the (flattened) element tensor A^ 
A - the set of secondary indices 

B - the set of auxiliary indices 

e - the error, e = U — u 

Fk - the mapping from Kq to K 

Gk - the geometry tensor with entries {G'^}a£A 

qk - the vector representation of the (flattened) geometry tensor Gk 

X — the set 11^=1 [1' -^"'l °f indices for the global tensor A 

Tk - the set 11^=1 [I'^i^] °f indices for the element tensor A^"^ {primary indices) 

lk - the local-to- global mapping from Afx to J\f 

lk - the local-to-global mapping from Mr to J\f 

u^j^ - the local-to-global mapping from 7V^ to M-' 

K - a cell in the mesh T 

Kq - the reference cell 

L - the linear form (functional) on V or Vh 

m - the number of discrete function spaces used in the definition of a 

N - the dimension of Vh and Vh 

- the dimension Vf^ 

Nq - the number of quadrature points on a cell 

no ~ the dimension of Vq 

nx - the dimension of Vk 

fiK - the dimension of Pk 

rv'^ - the dimension of "Pj^ 

M - the set of global nodes on Vh 

M - the set of global nodes on Vh 

- the set of global nodes on 
TVq ~ the set of local nodes on Vq 
Mk - the set of local nodes on Vk 
Mk - the set of local nodes on Vk 
Mir ~ the set of local nodes on T'-L 
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ul - a node on Vq 

vf^ ~ a node on Vk 

vf- - a node on Vk 

vf'^ - a node on Vj^ 

Vq - the function space on for Vh 

pQ - the function space on Kq for Vh 

Vq - the function space on Kq for V^^ 

Vk ~ the local function space on K for Vh 

Vk - the local function space on K for Vh 

Vj^ - the local function space on K for Vj^ 

Pq{K) ~ the space of polynomials of degree < q on K 

Vk - the local function space on K generated by {Vj^}Y=i 

R - the residual, R{U) = A{U) - f 

r — the arity of the multilinear form a (the rank of A and A^^) 

U - the discrete approximate solution, U ^ u 

{Ui) - the vector of expansion coefficients for U = X^^^ Ui(j)i 

u - the exact solution of the given model A{u) = f 

V - the space of trial functions on (the trial space) 

V - the space of test functions on Q (the test space) 

Vh - the space of discrete trial functions on Q (the discrete trial space) 

Vh - the space of discrete test functions on Q (the discrete test space) 

Vjl - a discrete function space on O 

\V\ - the dimension of a vector space V 

$i ~ a basis function in Vo 

$j ~ a basis function in Vo 

- a basis function in Vq 
(pi - a basis function in Vh 

- a basis function in Vh 
(p-'- - a basis function in 
(pf - a basis function in Vk 
(pf - a basis function in Vk 
(pf'-' - a basis function in 

(/9 - the dual solution 

T - the mesh 

- a bounded domain in M'^ 



